correlation
- hana_ml.algorithms.pal.tsa.correlation_function.correlation(data, key=None, x=None, y=None, thread_ratio=None, method=None, max_lag=None, calculate_pacf=None, calculate_confint=False, alpha=None, bartlett=None)
This correlation function gives the statistical correlation between random variables. This correlation function gives the statistical correlation between random variables.
- Parameters
- dataDataFrame
Input data.
- keystr, optional
Name of the ID column.
Defaults to the index column of
data
(i.e. data.index) if it is set.- xstr, optional
Name of the first series data column.
- ystr, optional
Name of the second series data column.
- thread_ratiofloat, optional
The ratio of available threads.
0: single thread
0~1: percentage
Others: heuristically determined
Valid only when
method
is set as 'brute_force'.Defaults to -1.
- method{'auto', 'brute_force', 'fft'}, optional
Indicates the method to be used to calculate the correlation function.
Defaults to 'auto'.
- max_lagint, optional
Maximum lag for the correlation function.
Defaults to sqrt(n), where n is the data number.
- calculate_pacfbool, optional
Controls whether to calculate PACF or not.
Valid only when only one series is provided.
Defaults to True.
- calculate_confintbool, optional
Controls whether to calculate confidence intervals or not.
If it is True, two additional columns of confidence intervals are shown in the result.
Defaults to False.
- alphafloat, optional
Confidence bound for the given level are returned. For instance if alpha=0.05, 95 % confidence bound is returned.
Valid only when only
calculate_confint
is True.Defaults to 0.05.
- bartlettbool, optional
False: using standard error to calculate the confidence bound.
True: using Bartlett's formula to calculate confidence bound.
Valid only when only
calculate_confint
is True.Defaults to True.
- Returns
- DataFrame
Result of the correlation function, structured as follows:
LAG: ID column.
CV: ACV/CCV.
CF: ACF/CCF.
PACF: PACF. Null if cross-correlation is calculated.
ACF_CONFIDENCE_BOUND: Confidence intervals of acf. The result will show this column when calculate_confint = True.
PACF_CONFIDENCE_BOUND: Confidence intervals of pacf. The result will show this column when calculate_confint = True.
Examples
Data for correlation:
>>> df.collect().head(10) ID X 0 1 88.0 1 2 84.0 2 3 85.0 3 4 85.0 4 5 84.0 5 6 85.0 6 7 83.0 7 8 85.0 8 9 88.0 9 10 89.0
Perform correlation function on the input dataframe:
>>> res = correlation(data=df, key='ID', x='X', thread_ratio=0.4, method='auto', calculate_pacf=True)
>>> res.collect() LAG CV CF PACF 0 0 1583.953600 1.000000 1.000000 1 1 1520.880736 0.960180 0.960180 2 2 1427.356272 0.901135 -0.266618 3 3 1312.695808 0.828746 -0.154417 4 4 1181.606944 0.745986 -0.120176 5 5 1041.042480 0.657243 -0.071546 6 6 894.493216 0.564722 -0.065065 7 7 742.178352 0.468561 -0.083686 8 8 587.453488 0.370878 -0.065213 9 9 434.287824 0.274180 -0.045501 10 10 286.464160 0.180854 -0.029586