hana_ml.algorithms.pal.tsa.stationarity_test.stationarity_test(data, key=None, endog=None, method=None, mode=None, lag=None, probability=None)

Stationarity means that a time series has a constant mean and constant variance over time. For many time series models, the input data has to be stationary for reasonable analysis.


Input data which contains at least two columns, one is ID column, the other is raw data.

keystr, optional

The ID (Time stamp) column. ID does not need to be in order, but must be unique and equal sampling. The supported data type is INTEGER.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

endogstr, optional

The column of series to be tested.

Defaults to the first non-key column.

methodstr, optional

Statistic test that used to determine stationarity. The options are "kpss" and "adf".

Defaults "kpss".

modestr, optional

Type of stationarity to determine. The options are "level", "trend" and "no". Note that option "no" is not applicable to "kpss".

Defaults to "level".

lagint, optional

The lag order to calculate the test statistic.

Default value is "kpss": int(12*(data_length / 100)^0.25" ) and "adf": int(4*(data_length / 100)^(2/9)).

probabilityfloat, optional

The confidence level for confirming stationarity.

Defaults to 0.9.

Statistics for time series, structured as follows:
  • STATS_NAME: Name of the statistics of the series.

  • STATS_VALUE: Indicates the value of corresponding stats.


Time series data df:

>>> df.head(3).collect()
0      0           0.0
1      1           1.00
2      2           1586.00

Perform stationarity_test():

>>> stats = stationarity_test(df, endog='SERIES', key='TIME_STAMP',
                              method='kpss', mode='trend', lag=5, probability=0.95)


>>> stats.head(3).collect()
0    stationary     0
1    kpss_stat      0.26801
2    p-value        0.01