white_noise_test

hana_ml.algorithms.pal.tsa.white_noise_test.white_noise_test(data, key=None, endog=None, lag=None, probability=None, thread_ratio=None, model_df=None)

This algorithm is used to identify whether a time series is a white noise series. If white noise exists in the raw time series, the algorithm returns the value of 1. If not, the value of 0 will be returned.

Parameters:
dataDataFrame

Input data which contains at least two columns, one is ID column, the other is raw data.

keystr, optional

The ID column.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

endogstr, optional

The column of series to be tested.

Defaults to the first non-key column.

lagint, optional

Specifies the lag autocorrelation coefficient that the statistic will be based on.

It corresponds to the degree of freedom of chi-square distribution.

Defaults to half of the sample size (n/2).

probabilityfloat, optional

The confidence level used for chi-square distribution.

The value is 1 - a, where a is the significance level.

Defaults to 0.9.

thread_ratiofloat, optional

The ratio of available threads.

  • 0: single thread.

  • 0~1: percentage.

  • Others: heuristically determined.

Defaults to -1.

model_dfint, optional

Specifies the number of degrees of freedom occupied by a model.

Should be provided if the input data is the residual of some raw time-series data after being fitted by a model.

Defaults to 0.

Returns:
DataFrame

Statistics for time series, structured as follows:

  • STAT_NAME: Name of the statistics of the series.

  • STAT_VALUE: include following values:

    • WN: 1 for white noise, 0 for not white noise.

    • Q: Q statistics defined as above.

    • chi^2: chi-square distribution.

Examples

Time series data df:

>>> df.head(3).collect()
  TIME_STAMP    SERIES
0          0   1356.00
1          1    826.00
2          2   1586.00

Perform white_noise_test function:

>>> stats = white_noise_test(data=df,
                             endog='SERIES',
                             model_df=1,
                             lag=3,
                             probability=0.9,
                             thread_ratio=0.2)

Outputs:

>>> stats.collect()
   STAT_NAME    STAT_VALUE
0         WN      0.000000
1          Q      5.576053
2      chi^2      4.605170