stationarity_test
- hana_ml.algorithms.pal.tsa.stationarity_test.stationarity_test(data, key=None, endog=None, method=None, mode=None, lag=None, probability=None)
Stationarity means that a time series has a constant mean and constant variance over time. For many time series models, the input data has to be stationary for reasonable analysis.
- Parameters:
- dataDataFrame
Input data which contains at least two columns, one is ID column, the other is raw data.
- keystr, optional
The ID (Time stamp) column. ID does not need to be in order, but must be unique and equal sampling. The supported data type is INTEGER.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- endogstr, optional
The column of series to be tested.
Defaults to the first non-key column.
- methodstr, optional
Statistic test that used to determine stationarity. The options are "kpss" and "adf".
Defaults "kpss".
- modestr, optional
Type of stationarity to determine. The options are "level", "trend" and "no". Note that option "no" is not applicable to "kpss".
Defaults to "level".
- lagint, optional
The lag order to calculate the test statistic.
Default value is "kpss": int(12*(data_length / 100)^0.25" ) and "adf": int(4*(data_length / 100)^(2/9)).
- probabilityfloat, optional
The confidence level for confirming stationarity.
Defaults to 0.9.
- Returns:
- DataFrame
- Statistics for time series, structured as follows:
STATS_NAME: Name of the statistics of the series.
STATS_VALUE: Indicates the value of corresponding stats.
Examples
Time series data df:
>>> df.head(3).collect() TIME_STAMP SERIES 0 0 0.0 1 1 1.00 2 2 1586.00
Perform stationarity_test():
>>> stats = stationarity_test(df, endog='SERIES', key='TIME_STAMP', method='kpss', mode='trend', lag=5, probability=0.95)
Outputs:
>>> stats.head(3).collect() STATS_NAME STATS_VALUE 0 stationary 0 1 kpss_stat 0.26801 2 p-value 0.01