AutoARIMA
- class hana_ml.algorithms.pal.tsa.auto_arima.AutoARIMA(seasonal_period=None, seasonality_criterion=None, d=None, kpss_significance_level=None, max_d=None, seasonal_d=None, ch_significance_level=None, max_seasonal_d=None, max_p=None, max_q=None, max_seasonal_p=None, max_seasonal_q=None, information_criterion=None, search_strategy=None, max_order=None, initial_p=None, initial_q=None, initial_seasonal_p=None, initial_seasonal_q=None, guess_states=None, max_search_iterations=None, method=None, allow_linear=None, forecast_method=None, output_fitted=None, thread_ratio=None, background_size=None, massive=False, group_params=None)
The ARIMA model, a potent tool in time series analysis, can be challenging due to the difficulty in selecting suitable parameters. AutoARIMA automates this selection process. This model includes seven parameters (p, d, q, P, D, Q, and m), where the seasonality (m) can be estimated using the seasonal_decompose() function, and 'd' and 'D' are usually determined first due to information criterion considerations. The optimal values of p, q, P, Q are obtained through two main methods: 'exhaustive search,' which tests all possible combinations but can be time-consuming, and 'stepwise search,' which is more efficient but may not yield the optimal result. The constant part's inclusion depends on the criterion information, mainly when d + D isn't more than 1.
- Parameters:
- seasonal_periodint, optional
Value of the seasonal period.
Negative: Automatically identify seasonality by means of an auto-correlation scheme.
0 or 1: Non-seasonal.
Others: Seasonal period.
Defaults to -1.
- seasonality_criterionfloat, optional
The criterion of the auto-correlation coefficient for accepting seasonality, in the range of (0, 1). The larger it is, the less probable a time series is regarded to be seasonal. Valid only when
seasonal_periodis negative.Defaults to 0.2.
- Dint, optional
Order of first-differencing.
Others: Uses the specified value as the first-differencing order.
Negative: Automatically identifies the first-differencing order with the KPSS test.
Defaults to -1.
- kpss_significance_levelfloat, optional
The significance level for the KPSS test. Supported values are 0.01, 0.025, 0.05, and 0.1. The smaller it is, the more probable a time series is considered as first-stationary, that is, the less probable it needs first-differencing. Valid only when
Dis negative.Defaults to 0.05.
- max_dint, optional
The maximum value of D when the KPSS test is applied.
Defaults to 2.
- seasonal_dint, optional
Order of seasonal-differencing.
Negative: Automatically identifies seasonal-differencing order using the Canova-Hansen test.
Others: Uses the specified value as the seasonal-differencing order.
Defaults to -1.
- ch_significance_levelfloat, optional
The significance level for the Canova-Hansen test. Supported values are 0.01, 0.025, 0.05, 0.1, and 0.2. The smaller it is, the more probable a time series is considered seasonal-stationary; that is, the less probable it needs seasonal-differencing.
Valid only when
seasonal_dis negative.Defaults to 0.05.
- max_seasonal_dint, optional
The maximum value of
seasonal_dwhen the Canova-Hansen test is applied.Defaults to 1.
- max_pint, optional
The maximum value of AR order p.
Defaults to 5.
- max_qint, optional
The maximum value of MA order q.
Defaults to 5.
- max_seasonal_pint, optional
The maximum value of SAR order P.
Defaults to 2.
- max_seasonal_qint, optional
The maximum value of SMA order Q.
Defaults to 2.
- information_criterion{'aicc', 'aic', 'bic'}, optional
The information criterion for order selection.
'aicc': Akaike information criterion with correction (for small sample sizes)
'aic': Akaike information criterion
'bic': Bayesian information criterion
Defaults to 'aicc'.
- search_strategy{'exhaustive', 'stepwise'}, optional
Specifies the search strategy for the optimal ARMA model.
'exhaustive': exhaustive traverse.
'stepwise': stepwise traverse.
Defaults to 'stepwise'.
- max_orderint, optional
The maximum value of (
max_p+max_q+max_seasonal_p+max_seasonal_q). Valid only whensearch_strategyis 'exhaustive'.Defaults to 15.
- initial_pint, optional
Order p of the user-defined initial model. Valid only when
search_strategyis 'stepwise'.Defaults to 0.
- initial_qint, optional
Order q of the user-defined initial model. Valid only when
search_strategyis 'stepwise'.Defaults to 0.
- initial_seasonal_pint, optional
Order seasonal_p of the user-defined initial model. Valid only when
search_strategyis 'stepwise'.Defaults to 0.
- initial_seasonal_qint, optional
Order seasonal_q of the user-defined initial model. Valid only when
search_strategyis 'stepwise'.Defaults to 0.
- guess_statesint, optional
If employing ACF/PACF to guess initial ARMA models, besides the user-defined model:
0: No guess. Besides the user-defined model, uses states (2, 2) (1, 1)m, (1, 0) (1, 0)m, and (0, 1) (0, 1)m meanwhile as starting states.
1: Guesses starting states taking advantage of ACF/PACF.
Valid only when
search_strategyis 'stepwise'.Defaults to 1.
- max_search_iterationsint, optional
The maximum iterations for searching optimal ARMA states.
Valid only when
search_strategyis 'stepwise'.Defaults to (
max_p+ 1) * (max_q+ 1) * (max_seasonal_p+ 1) * (max_seasonal_q+ 1).- method{'css', 'mle', 'css-mle'}, optional
The objective function for numeric optimization
'css': use the conditional sum of squares.
'mle': use the maximized likelihood estimation.
'css-mle': use css to approximate starting values first and then mle to fit.
Defaults to 'css-mle'.
- allow_linearbool, optional
Controls whether to check the linear model ARMA(0,0)(0,0)m.
Defaults to True.
- forecast_method{'formula_forecast', 'innovations_algorithm'}, optional
Store information for the subsequent forecast method.
'formula_forecast': compute future series via formula.
'innovations_algorithm': apply the innovations algorithm to compute future series, which requires more original information to be stored.
Defaults to 'innovations_algorithm'.
- output_fittedbool, optional
Output fitted result and residuals if True.
Defaults to True.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored, and this function heuristically determines the number of threads to use.
Defaults to -1.
- background_sizeint, optional
Indicates the number of data points used in ARIMA with explanations in the predict function. If you want to use the ARIMA with explanations, you must set
background_sizeto be a positive value or -1 (auto mode) when initializing an ARIMA instance and then setshow_explainer=Truein the predict function.Defaults to NULL (no explanations).
- massivebool, optional
Specifies whether or not to activate massive mode.
True : massive mode.
False : single mode.
For parameter setting in massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter settings will not be applicable to such a group.
An example is as follows:
In this example, as a parameter 'output_fitted' is set in group_params for Group_1 & Group_2, parameter setting of 'background_size' is not applicable to Group_1 & Group_2.
Defaults to False.
- group_paramsdict, optional
If massive mode is activated (
massiveis True), input data is divided into different groups with different parameters applied.An example with group_params is as follows:
Valid only when
massiveis True and defaults to None.
- Attributes:
- model_DataFrame
Model content.
- fitted_: DataFrame
Fitted values and residuals.
- explainer_DataFrame
The decomposition of trend, seasonal, transitory, irregular and reason code of exogenous variables. Only contains value after
show_explainer=Truein the predict() function.- permutation_importance_DataFrame
The importance of exogenous variables as determined by permutation importance analysis. The attribute only appears when invoking the get_permutation_importance() function after a trained model is obtained, structured as follows:
1st column : PAIR, measure name.
2nd column : NAME, exogenous regressor name.
3rd column : VALUE, the importance of the exogenous regressor.
Methods
fit(data[, key, endog, exog, group_key, ...])Fit the model to the training dataset.
get_permutation_importance(data[, model, ...])Please see Permutation Feature Importance for Time Series for details.
predict([data, key, group_key, ...])Generates time series forecasts based on the fitted model.
set_conn(connection_context)Set connection context for an ARIMA instance.
Examples
Create an AutoARIMA instance:
>>> autoarima = AutoARIMA(search_strategy='stepwise', allow_linear=True, thread_ratio=1.0)
Perform fit():
>>> autoarima.fit(data=df)
Output:
>>> autoarima.model_.collect() >>> autoarima.fitted_.collect()
Perform predict():
>>> result = autoarima.predict(forecast_method='innovations_algorithm', forecast_length=10) >>> result.collect()
If you want to see the decomposed result of the predict result, you could set
show_explainer = True:>>> result = autoarima.predict(forecast_method='innovations_algorithm', forecast_length=10, allow_new_index=False, show_explainer=True)
Show the attribute
explainer_of the AutoARIMA instance:>>> autoarima.explainer_.collect()
- fit(data, key=None, endog=None, exog=None, group_key=None, group_params=None, categorical_variable=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Input data which at least has two columns: key and endog.
We also support ARIMAX which needs external data (exogenous variables).
- keystr, optional
The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE or SECONDDATE.
In single mode, defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
In massive mode, defaults to the first non-group key column of data if the index columns of data are not provided. Otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.
- endogstr, optional
The endogenous variable, i.e., time series. The type of endog column should be INTEGER, DOUBLE or DECIMAL(p,s).
In single mode, defaults to the first non-key column. In massive mode, defaults to the first non-group_key, non-key column.
- exoglist of str, optional
An optional array of exogenous variables. The type of exog column should be INTEGER, DOUBLE or DECIMAL(p,s).
Valid only for Auto ARIMAX.
Defaults to None. Please set this parameter explicitly if you have exogenous variables.
- group_keystr, optional
The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the
group_paramsare valid.This parameter is valid only when massive mode is activated (i.e., parameter
massiveis set as True in class instance initialization).Defaults to the first column of data if the index columns of data are not provided. Otherwise, defaults to the first column of index columns.
- group_paramsdict, optional
If massive mode is activated (
massiveis set to True in class instance initialization), input data is divided into different groups with different parameters applied.An example with
group_paramsis as follows:Valid only when
massiveis True.- categorical_variablestr or list of str, optional
Specifies INTEGER columns that should be treated as categorical. Other INTEGER columns will be treated as continuous.
Defaults to None.
- Returns:
- A fitted object of class "AutoARIMA".
- get_permutation_importance(data, model=None, key=None, endog=None, exog=None, repeat_time=None, random_state=None, thread_ratio=None, partition_ratio=None, regressor_top_k=None, accuracy_measure=None, ignore_zero=None)
Please see Permutation Feature Importance for Time Series for details.
- predict(data=None, key=None, group_key=None, group_params=None, forecast_method=None, forecast_length=None, allow_new_index=False, show_explainer=False, thread_ratio=None, top_k_attributions=None, trend_mod=None, trend_width=None, seasonal_width=None)
Generates time series forecasts based on the fitted model.
- Parameters:
- dataDataFrame, optional
Index and exogenous variables for forecast. For ARIMAX only.
Defaults to None.
- keystr, optional
The timestamp column of data. The data type of the key column should be INTEGER, TIMESTAMP, DATE, or SECONDDATE. For ARIMAX only.
In massive mode, defaults to the first non-group key column of data if the index columns of data are not provided. Otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.
- group_keystr, optional
The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. This parameter is only valid when
massiveis True.Defaults to the first column of data if the index columns of data are not provided. Otherwise, defaults to the first column of index columns.
- group_paramsdict, optional
If massive mode is activated (
massiveis True in class instance initialization), input data is divided into different groups with different parameters applied.An example with
group_paramsis as follows:Valid only when
self.massiveis True.Defaults to None.
- forecast_method{'formula_forecast', 'innovations_algorithm', 'truncation_algorithm'}, optional
Specify the forecast method.
'formula_forecast': forecast via formula.
'innovations_algorithm': apply innovations algorithm to forecast.
'truncation_algorithm': a forecast method much faster than the innovations algorithm when the AR representation of the ARIMA model can be truncated to finite order.
Defaults to 'innovations_algorithm' if, in class initialization, the parameter
forecast_methodis not set, or set as 'innovations_algorithm'; otherwise defaults to 'formula_forecast'.- forecast_lengthint, optional
Number of points to forecast. Valid only when
datais None.In ARIMAX, the forecast length is the same as the length of the input predict data.
Defaults to None.
- allow_new_indexbool, optional
Whether to recalculate and output the index column of the forecast result based on the type of the fitting data's index column.
True: The index column in the forecast result will be recalculated to match the type and sequence of the fitting data's index column.
False: The forecast result will output the original result from HANA PAL, which may use an integer index even if the fitting data's index column is a timestamp.
Defaults to False.
- show_explainerbool, optional
Indicates whether to invoke the ARIMA with explanations function in the predict. Only valid when
background_sizeis set when initializing an ARIMA instance.If True, the contributions of trend, seasonal, transitory, irregular, and exogenous are shown in an attribute called
explainer_of the ARIMA / auto ARIMA instance.Defaults to False.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use. Valid only when
show_explaineris True.Defaults to -1.
- top_k_attributionsint, optional
Specifies the number of attributes with the largest contribution that will be output. 0-contributed attributes will not be output. Valid only when
show_explaineris True.Defaults to 10.
- trend_modfloat, optional
The real AR roots with inverse modulus larger than this parameter will be integrated into the trend component. Valid only when
show_explaineris True. Cannot be smaller than 0.Defaults to 0.4.
- trend_widthfloat, optional
Specifies the bandwidth of the spectrum of the trend component in units of rad. Valid only when
show_explaineris True. Cannot be smaller than 0.Defaults to 0.035.
- seasonal_widthfloat, optional
Specifies the bandwidth of the spectrum of the seasonal component in units of rad. Valid only when
show_explaineris True. Cannot be smaller than 0.Defaults to 0.035.
- Returns:
- DataFrame 1
Forecasted values.
- DataFrame 2 (optional)
The explanations with decomposition of trend, seasonal, transitory, irregular, and reason code of exogenous variables. Only valid if
show_explaineris True.- DataFrame 3 (optional)
Error message. Only valid if
massiveis True.
- set_conn(connection_context)
Set connection context for an ARIMA instance.
- Parameters:
- connection_contextConnectionContext
The connection to the SAP HANA system.
- Returns:
- None.
Inherited Methods from PALBase
Besides those methods mentioned above, the AutoARIMA class also inherits methods from PALBase class, please refer to PAL Base for more details.