AutoARIMA
- class hana_ml.algorithms.pal.tsa.auto_arima.AutoARIMA(seasonal_period=None, seasonality_criterion=None, d=None, kpss_significance_level=None, max_d=None, seasonal_d=None, ch_significance_level=None, max_seasonal_d=None, max_p=None, max_q=None, max_seasonal_p=None, max_seasonal_q=None, information_criterion=None, search_strategy=None, max_order=None, initial_p=None, initial_q=None, initial_seasonal_p=None, initial_seasonal_q=None, guess_states=None, max_search_iterations=None, method=None, allow_linear=None, forecast_method=None, output_fitted=None, thread_ratio=None, background_size=None, massive=False, group_params=None)
Although the ARIMA model is useful and powerful in time series analysis, it is somehow difficult to choose appropriate orders. It is necessary, therefore, to determine the orders automatically. Hence, AutoARIMA function identifies the orders of an ARIMA model.
- Parameters
- seasonal_periodint, optional
Value of the seasonal period.
Negative: Automatically identify seasonality by means of auto-correlation scheme.
0 or 1: Non-seasonal.
Others: Seasonal period.
Defaults to -1.
- seasonality_criterionfloat, optional
The criterion of the auto-correlation coefficient for accepting seasonality, in the range of (0, 1).
The larger it is, the less probable a time series is regarded to be seasonal.
Valid only when
seasonal_period
is negative.Defaults to 0.2.
- Dint, optional
Order of first-differencing.
Others: Uses the specified value as the first-differencing order.
Negative: Automatically identifies first-differencing order with KPSS test.
Defaults to -1.
- kpss_significance_levelfloat, optional
The significance level for KPSS test. Supported values are 0.01, 0.025, 0.05, and 0.1.
The smaller it is, the larger probable a time series is considered as first-stationary, that is, the less probable it needs first-differencing.
Valid only when
D
is negative.Defaults to 0.05.
- max_dint, optional
The maximum value of D when KPSS test is applied.
Defaults to 2.
- seasonal_dint, optional
Order of seasonal-differencing.
Negative: Automatically identifies seasonal-differencing order Canova-Hansen test.
Others: Uses the specified value as the seasonal-differencing order.
Defaults to -1.
- ch_significance_levelfloat, optional
The significance level for Canova-Hansen test. Supported values are 0.01, 0.025, 0.05, 0.1, and 0.2.
The smaller it is, the larger probable a time series is considered seasonal-stationary; that is, the less probable it needs seasonal-differencing.
Valid only when
seasonal_d
is negative.Defaults to 0.05.
- max_seasonal_dint, optional
The maximum value of
seasonal_d
when Canova-Hansen test is applied.Defaults to 1.
- max_pint, optional
The maximum value of AR order p.
Defaults to 5.
- max_qint, optional
The maximum value of MA order q.
Defaults to 5.
- max_seasonal_pint, optional
The maximum value of SAR order P.
Defaults to 2.
- max_seasonal_qint, optional
The maximum value of SMA order Q.
Defaults to 2.
- information_criterion{'aicc', 'aic', 'bic'}, optional
The information criterion for order selection.
'aicc': Akaike information criterion with correction(for small sample sizes)
'aic': Akaike information criterion
'bic': Bayesian information criterion
Defaults to 'aicc'.
- search_strategy{'exhaustive', 'stepwise'}, optional
Specifies the search strategy for optimal ARMA model.
'exhaustive': exhaustive traverse.
'stepwise': stepwise traverse.
Defaults to 'stepwise'.
- max_orderint, optional
The maximum value of (
max_p
+max_q
+max_seasonal_p
+max_seasonal_q
). Valid only whensearch_strategy
is 'exhaustive'.Defaults to 15.
- initial_pint, optional
Order p of user-defined initial model.
Valid only when
search_strategy
is 'stepwise'.Defaults to 0.
- initial_qint, optional
Order q of user-defined initial model.
Valid only when
search_strategy
is 'stepwise'.Defaults to 0.
- initial_seasonal_pint, optional
Order seasonal_p of user-defined initial model.
Valid only when
search_strategy
is 'stepwise'.Defaults to 0.
- initial_seasonal_qint, optional
Order seasonal_q of user-defined initial model.
Valid only when
search_strategy
is 'stepwise'.Defaults to 0.
- guess_statesint, optional
If employing ACF/PACF to guess initial ARMA models, besides user-defined model:
0: No guess. Besides user-defined model, uses states (2, 2) (1, 1)m, (1, 0) (1, 0)m, and (0, 1) (0, 1)m meanwhile as starting states.
1: Guesses starting states taking advantage of ACF/PACF.
Valid only when
search_strategy
is 'stepwise'.Defaults to 1.
- max_search_iterationsint, optional
The maximum iterations for searching optimal ARMA states.
Valid only when
search_strategy
is 'stepwise'.Defaults to (
max_p
+ 1) * (max_q
+ 1) * (max_seasonal_p
+ 1) * (max_seasonal_q
+ 1).- method{'css', 'mle', 'css-mle'}, optional
The object function for numeric optimization
'css': use the conditional sum of squares.
'mle': use the maximized likelihood estimation.
'css-mle': use css to approximate starting values first and then mle to fit.
Defaults to 'css-mle'.
- allow_linearbool, optional
Controls whether to check linear model ARMA(0,0)(0,0)m.
Defaults to True.
- forecast_method{'formula_forecast', 'innovations_algorithm'}, optional
Store information for the subsequent forecast method.
'formula_forecast': compute future series via formula.
'innovations_algorithm': apply innovations algorithm to compute future series, which requires more original information to be stored.
Defaults to 'innovations_algorithm'.
- output_fittedbool, optional
Output fitted result and residuals if True.
Defaults to True.
- thread_ratiofloat, optional
Controls the proportion of available threads to use.
The ratio of available threads.
0: single thread.
0~1: percentage.
Others: heuristically determined.
Defaults to -1.
- background_sizeint, optional
Indicates the number of data points used in ARIMA with explanations in the predict function. If you want to use the ARIMA with explanations, you must set
background_size
to be a positive value or -1(auto mode) when initializing an ARIMA instance the and then setshow_explainer=True
in the predict function.Defaults to NULL(no explanations).
- massivebool, optional
Specifies whether or not to activate massive mode.
For parameter setting in massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter setting will be not applicable to such group.
An example is as follows:
In this example, as a parameter 'output_fitted' is set in group_params for Group_1 & Group_2, parameter setting of 'background_size' is not applicable to Group_1 & Group_2.
Defaults to False.
- group_paramsdict, optional
If massive mode is activated (
massive
is True), input data is divided into different groups with different parameters applied.An example with group_params is as follows:
Valid only when
massive
is True and defaults to None.
Examples
Input DataFrame df for AutoARIMA:
>>> df.head(5).collect() TIMESTAMP Y 0 1 -24.525 1 2 34.720 2 3 57.325 3 4 10.340 4 5 -12.890
Create AutoARIMA instance:
>>> autoarima = AutoARIMA(search_strategy='stepwise', allow_linear=True, thread_ratio=1.0)
Perform fit on the given data df:
>>> autoarima.fit(data=df)
Show the output:
>>> autoarima.head(4).model_.collect() KEY VALUE 0 p 1 1 AR 0.255777 2 d 0 3 q 1
>>> autoarima.head(6).fitted_.collect().set_index('TIMESTAMP') TIMESTAMP FITTED RESIDUALS 0 1 NaN NaN 1 2 NaN NaN 2 3 NaN NaN 3 4 NaN NaN 4 5 24.525000 11.635000 5 6 37.583931 1.461069
Perform predict on the model:
>>> result = autoarima.predict(forecast_method='innovations_algorithm', forecast_length=10)
Show the output:
>>> result.collect() TIMESTAMP FORECAST SE LO80 HI80 LO95 HI95 0 0 -15.544837 3.298697 -19.772288 -11.317385 -22.010164 -9.079510 1 1 35.587387 3.404892 31.223840 39.950934 28.913920 42.260853 2 2 56.498514 3.411725 52.126211 60.870817 49.811656 63.185372
If you want to see the decomposed result of predict result, you could set
show_explainer = True
:>>> result = autoarima.predict(forecast_method='innovations_algorithm', forecast_length=10, allow_new_index=False, show_explainer=True)
Show the attribute
explainer_
of AutoARIMA instance:>>> autoarima.explainer_.head(5).collect() TIMESTAMP TREND SEASONAL TRANSITORY IRREGULAR EXOGENOUS 0 0 0.145204 -0.932973 0.927403 -24.937056 1 1 4.611087 0.336859 12.945590 25.755525 2 2 6.612419 0.815589 17.154548 47.954952
- Attributes
- model_DataFrame
Model content.
- fitted_: DateFrame
Fitted values and residuals.
- explainer_DataFrame
The decomposition of trend, seasonal, transitory, irregular and reason code of exogenous variables. Only contains value after
show_explainer=True
in the predict function.
Methods
Generate time series report.
fit
(data[, key, endog, exog, group_key, ...])Generates ARIMA models with given parameters.
generate_html_report
([filename])Display function.
Display function.
predict
([data, key, group_key, ...])Makes time series forecast based on the estimated ARIMA model.
set_conn
(connection_context)Set connection context for an ARIMA instance.
- fit(data, key=None, endog=None, exog=None, group_key=None, group_params=None, categorical_variable=None)
Generates ARIMA models with given parameters.
- Parameters
- dataDataFrame
Input data which at least have two columns: key and endog.
We also support ARIMAX which needs external data (exogenous variables).
- keystr, optional
The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE or SECONDDATE.
In single mode, defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
In massive mode, defaults to the first-non group key column of data if the index columns of data is not provided. Otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.
- endogstr, optional
The endogenous variable, i.e. time series. The type of endog column should be INTEGER, DOUBLE or DECIMAL(p,s).
In single mode, defaults to the first non-key column. In massive mode, defaults to the first non group_key, non key column.
- exoglist of str, optional
An optional array of exogenous variables. The type of exog column should be INTEGER, DOUBLE or DECIMAL(p,s).
Valid only for Auto ARIMAX.
Defaults to None. Please set this parameter explicitly if you have exogenous variables.
- group_keystr, optional
The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the
group_params
are valid.This parameter is valid only when massive mode is activated(i.e. parameter
massive
is set as True in class instance initialization).Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.
- group_paramsdict, optional
If massive mode is activated (
massive
is set True in class instance initialization), input data is divided into different groups with different parameters applied.An example with
group_params
is as follows:Valid only when
massive
is True.- categorical_variablestr or ist of str, optional
Specifies INTEGER columns specified that should be be treated as categorical.
Other INTEGER columns will be treated as continuous.
Defaults to None.
- Returns
- A fitted object of class "AutoARIMA".
- build_report()
Generate time series report.
- generate_html_report(filename=None)
Display function.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- generate_notebook_iframe_report()
Display function.
- predict(data=None, key=None, group_key=None, group_params=None, forecast_method=None, forecast_length=None, allow_new_index=False, show_explainer=False, thread_ratio=None, top_k_attributions=None, trend_mod=None, trend_width=None, seasonal_width=None)
Makes time series forecast based on the estimated ARIMA model.
- Parameters
- dataDataFrame, optional
Index and exogenous variables for forecast. For ARIMAX only.
Defaults to None.
- keystr, optional
The timestamp column of data. The data type of key column should be INTEGER, TIMESTAMP, DATE or SECONDDATE. For ARIMAX only.
In massive mode, defaults to the first-non group key column of data if the index columns of data is not provided. Otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.
- group_keystr, optional
The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.
This parameter is only valid when
massive
is True.Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.
- group_paramsdict, optional
If massive mode is activated (
massive
is True in class instance initialization), input data is divided into different groups with different parameters applied.An example with
group_params
is as follows:Valid only when self.massive is True.
Defaults to None.
- forecast_method{'formula_forecast', 'innovations_algorithm', 'truncation_algorithm'}, optional
Specify the forecast method.
'formula_forecast': forecast via formula.
'innovations_algorithm': apply innovations algorithm to forecast.
'truncation_algorithm': a forecast method much faster than innovations algorithm when the AR representation of ARIMA model can be truncated to finite order
Defaults to 'innovations_algorithm' if, in class initialization, the parameter
forecast_method
is not set, or set as 'inovations_algorithm'; otherwise defaults to 'formula_forecast'.- forecast_lengthint, optional
Number of points to forecast.
Valid only when
data
is None.In ARIMAX, the forecast length is the same as the length of the input data.
Defaults to None.
- allow_new_indexbool, optional
Indicate whether a new index column is allowed in the result.
True: return the result with new integer or timestamp index column.
False: return the result with index column starting from 0.
Defaults to False.
- show_explainerbool, optional
Indicate whether to invoke the ARIMA with explanations function in the predict. Only valid when
background_size
is set when initializing an ARIMA instance.If true, the contributions of trend, seasonal, transitory irregular and exogenous are shown in a attribute called explainer_ of arima/auto arima instance.
Defaults to False.
- thread_ratiofloat, optional
Controls the proportion of available threads to use. The ratio of available threads.
0: single thread
0~1: percentage
Others: heuristically determined
Defaults to -1. Valid only when
show_explainer
is True.- top_k_attributionsint, optional
Specifies the number of attributes with the largest contribution that will be output. 0-contributed attributes will not be output Valid only when
show_explainer
is True.Defaults to 10.
- trend_moddouble, optional
The real AR roots with inverse modulus larger than TREND_MOD will be integrated into trend component. Valid only when
show_explainer
is True. Cannot be smaller than 0.Defaults to 0.4.
- trend_widthdouble, optional
Specifies the bandwidth of spectrum of trend component in unit of rad. Valid only when
show_explainer
is True. Cannot be smaller than 0.Defaults to 0.035.
- seasonal_widthdouble, optional
Specifies the bandwidth of spectrum of seasonal component in unit of rad. Valid only when
show_explainer
is True. Cannot be smaller than 0.Defaults to 0.035.
- Returns
- DataFrame
Forecasted values, structured as follows:
ID, type INTEGER or TIMESTAMP.
FORECAST, type DOUBLE, forecast value.
SE, type DOUBLE, standard error.
LO80, type DOUBLE, low 80% value.
HI80, type DOUBLE, high 80% value.
LO95, type DOUBLE, low 95% value.
HI95, type DOUBLE, high 95% value.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
- set_conn(connection_context)
Set connection context for an ARIMA instance.
- Parameters
- connection_contextConnectionContext
The connection to the SAP HANA system.
- Returns
- None.