correlation
- hana_ml.algorithms.pal.tsa.correlation_function.correlation(data, key=None, x=None, y=None, thread_ratio=None, method=None, max_lag=None, calculate_pacf=None, calculate_confint=False, alpha=None, bartlett=None)
This correlation function gives the statistical correlation between random variables.
- Parameters:
- dataDataFrame
Input data.
- keystr, optional
Name of the ID column.
Defaults to the index column of
data
(i.e. data.index) if it is set.- xstr, optional
The name of the first series of data columns.
- ystr, optional
The name of the second series of data columns.
- thread_ratiofloat, optional
- Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use. Valid only when
method
is set as 'brute_force'.
Defaults to -1.
- method{'auto', 'brute_force', 'fft'}, optional
Indicates the method to be used to calculate the correlation function.
Defaults to 'auto'.
- max_lagint, optional
Maximum lag for the correlation function.
Defaults to sqrt(n), where n is the data number.
- calculate_pacfbool, optional
Controls whether to calculate Partial Autocorrelation Coefficient(PACF) or not.
Valid only when only one series is provided.
Defaults to True.
- calculate_confintbool, optional
Controls whether to calculate confidence intervals or not.
If it is True, two additional columns of confidence intervals are shown in the result.
Defaults to False.
- alphafloat, optional
Confidence bound for the given level are returned. For instance if alpha=0.05, 95% confidence bound is returned.
Valid only when only
calculate_confint
is True.Defaults to 0.05.
- bartlettbool, optional
False: use standard error to calculate the confidence bound.
True: use Bartlett's formula to calculate the confidence bound.
Valid only when only
calculate_confint
is True.Defaults to True.
- Returns:
- DataFrame
Result of the correlation function, structured as follows:
LAG: ID column.
CV: ACV/CCV.
CF: ACF/CCF.
PACF: PACF. Null if cross-correlation is calculated.
ACF_CONFIDENCE_BOUND: Confidence intervals of ACF. The result show this column when calculate_confint = True.
PACF_CONFIDENCE_BOUND: Confidence intervals of PACF. The result show this column when calculate_confint = True.
Examples
>>> res = correlation(data=df, key='ID', x='X', thread_ratio=0.4, method='auto', calculate_pacf=True) >>> res.collect()