correlation

hana_ml.algorithms.pal.tsa.correlation_function.correlation(data, key=None, x=None, y=None, thread_ratio=None, method=None, max_lag=None, calculate_pacf=None, calculate_confint=False, alpha=None, bartlett=None)

This correlation function gives the statistical correlation between random variables.

Parameters:
dataDataFrame

Input data.

keystr, optional

Name of the ID column.

Defaults to the index column of data (i.e. data.index) if it is set.

xstr, optional

The name of the first series of data columns.

ystr, optional

The name of the second series of data columns.

thread_ratiofloat, optional

The ratio of available threads.

  • 0: single thread

  • 0~1: percentage

  • Others: heuristically determined

Valid only when method is set as 'brute_force'.

Defaults to -1.

method{'auto', 'brute_force', 'fft'}, optional

Indicates the method to be used to calculate the correlation function.

Defaults to 'auto'.

max_lagint, optional

Maximum lag for the correlation function.

Defaults to sqrt(n), where n is the data number.

calculate_pacfbool, optional

Controls whether to calculate Partial Autocorrelation Coefficient(PACF) or not.

Valid only when only one series is provided.

Defaults to True.

calculate_confintbool, optional

Controls whether to calculate confidence intervals or not.

If it is True, two additional columns of confidence intervals are shown in the result.

Defaults to False.

alphafloat, optional

Confidence bound for the given level are returned. For instance if alpha=0.05, 95% confidence bound is returned.

Valid only when only calculate_confint is True.

Defaults to 0.05.

bartlettbool, optional
  • False: use standard error to calculate the confidence bound.

  • True: use Bartlett's formula to calculate the confidence bound.

Valid only when only calculate_confint is True.

Defaults to True.

Returns:
DataFrame

Result of the correlation function, structured as follows:

  • LAG: ID column.

  • CV: ACV/CCV.

  • CF: ACF/CCF.

  • PACF: PACF. Null if cross-correlation is calculated.

  • ACF_CONFIDENCE_BOUND: Confidence intervals of ACF. The result show this column when calculate_confint = True.

  • PACF_CONFIDENCE_BOUND: Confidence intervals of PACF. The result show this column when calculate_confint = True.

Examples

Input data:

>>> df.collect().head(10)
    ID      X
0    1   88.0
1    2   84.0
2    3   85.0
3    4   85.0
4    5   84.0
5    6   85.0
6    7   83.0
7    8   85.0
8    9   88.0
9   10   89.0

Perform correlation function on the input dataframe:

>>> res = correlation(data=df,
                      key='ID',
                      x='X',
                      thread_ratio=0.4,
                      method='auto',
                      calculate_pacf=True)
>>> res.collect()
    LAG           CV        CF      PACF
0     0  1583.953600  1.000000  1.000000
1     1  1520.880736  0.960180  0.960180
2     2  1427.356272  0.901135 -0.266618
3     3  1312.695808  0.828746 -0.154417
4     4  1181.606944  0.745986 -0.120176
5     5  1041.042480  0.657243 -0.071546
6     6   894.493216  0.564722 -0.065065
7     7   742.178352  0.468561 -0.083686
8     8   587.453488  0.370878 -0.065213
9     9   434.287824  0.274180 -0.045501
10   10   286.464160  0.180854 -0.029586