ARIMA

class hana_ml.algorithms.pal.tsa.arima.ARIMA(order=None, seasonal_order=None, method=None, include_mean=None, forecast_method=None, output_fitted=None, thread_ratio=None, background_size=None, solver=None, massive=False, group_params=None)

ARIMA, which stands for Autoregressive Integrated Moving Average, is a commonly used statistical method for forecasting and predicting time series data. Variants such as ARIMAX, SARIMA, and SARIMAX are also supported by PAL ARIMA, depending on the provision of seasonal information and external (intervention) data. In ARIMA forecasting, the values are divided into the 'signal' and 'external' components. The 'signal' component comes from the ARIMA model itself, which can be further broken down into trend, seasonal, transitory, and irregular elements. The external part, on the other hand, captures the Shapley Value of each exogenous data by LinearSHAP.

Parameters:

order(p, d, q), tuple of int, optional

p: value of the auto-regression order.
d: value of the differentiation order.
q: value of the moving average order.

Defaults to (0, 0, 0).

seasonal_order(P, D, Q, s), tuple of int, optional

P: value of the auto-regression order for the seasonal part.
D: value of the differentiation order for the seasonal part.
Q: value of the moving average order for the seasonal part.
s: value of the seasonal period.

Defaults to (0, 0, 0, 0).

method{'css', 'mle', 'css-mle'}, optional

'css': use the conditional sum of squares.
'mle': use the maximized likelihood estimation.
'css-mle': use css to approximate starting values first and then mle to fit.

Defaults to 'css-mle'.

include_meanbool, optional

ARIMA model includes a constant part if True. Valid only when d + D <= 1 (d is defined in order and D is defined in seasonal_order).

Defaults to True if d + D = 0 else False.

forecast_method{'formula_forecast', 'innovations_algorithm'}, optional

'formula_forecast': compute future series via formula.
'innovations_algorithm': apply innovations algorithm to compute future series, which requires more original information to be stored.

Store information for the subsequent forecast method.

Defaults to 'innovations_algorithm'.

output_fittedbool, optional

Output fitted result and residuals if True.

Defaults to True.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to -1.

background_sizeint, optional

Indicates the number of data points used in ARIMA model explanation in the predict function. If you want to use the ARIMA with explanation, you must set background_size to be a positive value or -1 (auto mode) when initializing an ARIMA instance and then set show_explainer=True in the predict function.

Defaults to None (no model explanation).

solver{'bfgs', 'l-bfgs', 'l-bfgs-b'}, optional

Optimization solver. Options are 'bfgs', 'l-bfgs', 'l-bfgs-b'.

Defaults to 'l-bfgs'.

massivebool, optional

Specifies whether or not to activate massive mode.

True : massive mode.
False : single mode.

For parameter setting in massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter settings will not be applicable to such a group.

An example is as follows:

In this example, as a parameter 'output_fitted' is set in group_params for Group_1 & Group_2, parameter setting of 'background_size' is not applicable to Group_1 & Group_2.

Defaults to False.

group_paramsdict, optional

If massive mode is activated (massive is True), input data is divided into different groups with different parameters applied.

An example with group_params is as follows:

Valid only when massive is True and defaults to None.

Attributes:

model_DataFrame

Model content.

fitted_DataFrame

Fitted values and residuals.

explainer_DataFrame

The explanations with decomposition of trend, seasonal, transitory, irregular and reason code of exogenous variables. The attribute only appears when setting background_size when initializing an ARIMA instance and show_explainer=True in the predict() function.

error_msg_DataFrame

Error message. Only valid if massive is True when initializing an 'ARIMA' instance.

permutation_importance_DataFrame

The importance of exogenous variables as determined by permutation importance analysis. The attribute only appears when invoking get_permutation_importance() function after a trained model is obtained, structured as follows:

1st column : PAIR, measure name.

2nd column : NAME, exogenous regressor name.

3rd column : VALUE, the importance of the exogenous regressor.

Methods

`fit`(data[, key, endog, exog, group_key, ...])	Fit the model to the training dataset.
`get_permutation_importance`(data[, model, ...])	Please see Permutation Feature Importance for Time Series for details.
`predict`([data, key, group_key, ...])	Generates time series forecasts based on the fitted model.
`set_conn`(connection_context)	Set connection context for an ARIMA instance.

References

Forecasted values of the ARIMAX model can be locally interpreted (explained), please see:

Examples

ARIMA example:

Input DataFrame df:

>>> df.head(5).collect()
   TIMESTAMP              Y
        1   -0.636126431
        2    3.092508651
        3    -0.73733556
        4   -3.142190983
        5    2.088819813

Create an ARIMA instance:

>>> arima = ARIMA(order=(0, 0, 1), seasonal_order=(1, 0, 0, 4), method='mle', thread_ratio=1.0)

Perform fit():

>>> arima.fit(data=df)

Output:

>>> arima.model_.head(5).collect()
   KEY      VALUE
  p          0
 AR
  d          0
  q          1
 MA  -0.141073

>>> arima.fitted_.head(3).collect().set_index('TIMESTAMP')
   TIMESTAMP     FITTED    RESIDUALS
0          1   0.023374    -0.659500
1          2   0.114596     2.977913
2          3  -0.396567    -0.340769

Perform predict():

>>> result = arima.predict(forecast_method='innovations_algorithm', forecast_length=10)

Output:

>>> result.head(3).collect()
  TIMESTAMP   FORECAST           SE        LO80        HI80         LO95        HI95
0         0   1.557783     1.302436   -0.111357    3.226922    -0.994945    4.110511
1         1   3.765987     1.315333    2.080320    5.451654     1.187983    6.343992
2         2  -0.565599     1.315333   -2.251266    1.120068    -3.143603    2.012406

If you want to see the decomposed result of the predict result, you could set background_size when initializing an ARIMA instance and set show_explainer = True in the predict():

>>> arima = ARIMA(order=(0, 0, 1),
                  seasonal_order=(1, 0, 0, 4),
                  method='mle',
                  thread_ratio=1.0,
                  background_size=10)
>>> result = arima.predict(forecast_method='innovations_algorithm',
                           forecast_length=3,
                           allow_new_index=False,
                           show_explainer=True)

Show the explainer_ of the ARIMA instance:

>>> arima.explainer_.head(3).collect()
  ID     TREND SEASONAL TRANSITORY IRREGULAR                                          EXOGENOUS
0  0  1.179043     None       None      None  [{"attr":"X","val":-0.49871412549199997,"pct":...
1  1  1.252138     None       None      None  [{"attr":"X","val":-0.27390052549199997,"pct":...
2  2  1.362164     None       None      None  [{"attr":"X","val":-0.19046313238292013,"pct":...

ARIMAX example:

Input DataFrame df:

>>> df.head(5).collect()
   ID                   Y                   X
 1                 1.2                 0.8
 2    1.34845613096197                 1.2
 3    1.32261090809898    1.34845613096197
 4    1.38095306748554    1.32261090809898
 5    1.54066648969168    1.38095306748554

Create an ARIMAX instance:

>>> arimax = ARIMA(order=(1, 0, 1), method='mle', thread_ratio=1.0)

Perform fit():

>>> arimax.fit(data=df, endog='Y')

Output:

>>> arimax.model_.head(5).collect()
   KEY      VALUE
  p          1
 AR   0.302207
  d          0
  q          1
 MA   0.291575

>>> arimax.fitted_.head(3).collect().set_index('TIMESTAMP')
  TIMESTAMP     FITTED    RESIDUALS
0         1   1.182363     0.017637
1         2   1.416213    -0.067757
2         3   1.453572    -0.130961

Perform predict():

>>> df2.head(5).collect()
  TIMESTAMP          X
       1   0.800000
       2   1.200000
       3   1.348456
       4   1.322611
       5   1.380953

>>> result = arimax.predict(data=df2,
                            forecast_method='innovations_algorithm',
                            forecast_length=5)

Output:

>>> result.head(3).collect()
   TIMESTAMP   FORECAST          SE        LO80         HI80        LO95        HI95
0          0   1.195952    0.093510    1.076114     1.315791    1.012675    1.379229
1          1   1.411284    0.108753    1.271912     1.550657    1.198132    1.624436
2          2   1.491856    0.110040    1.350835     1.632878    1.276182    1.707530

fit(data, key=None, endog=None, exog=None, group_key=None, group_params=None, categorical_variable=None)

Fit the model to the training dataset.

Parameters:

dataDataFrame

Input data which at least have two columns: key and endog.

We also support ARIMAX which needs external data (exogenous variables).

keystr, optional

The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE or SECONDDATE.

In massive mode, defaults to the first-non group key column of data if the index columns of data is not provided. Otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.

endogstr, optional

The endogenous variable, i.e. time series. The type of endog column could be INTEGER, DOUBLE or DECIMAL(p,s).

In single mode, defaults to the first non-ID column. In massive mode, defaults to the first non group_key, non key column.

exogstr or a list of str, optional

An optional array of exogenous variables. The type of exog column could be INTEGER, DOUBLE or DECIMAL(p,s).

Valid only for ARIMAX.

Defaults to None. Please set this parameter explicitly if you have exogenous variables.

group_keystr, optional

The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. This parameter is only valid when massive is True in class instance initialization.

Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.

group_paramsdict, optional

If massive mode is activated (massive is True in class instance initialization), input data is divided into different groups with different parameters applied.

An example with group_params is as follows:

Valid only when massive is True in class instance initialization(i.e. self.massive is True).

Defaults to None.

categorical_variablestr or ist of str, optional

Specifies INTEGER columns specified that should be be treated as categorical.

Other INTEGER columns will be treated as continuous.

Defaults to None.

Returns:

A fitted object of class "ARIMA".

predict(data=None, key=None, group_key=None, group_params=None, forecast_method=None, forecast_length=None, allow_new_index=False, show_explainer=False, thread_ratio=None, top_k_attributions=None, trend_mod=None, trend_width=None, seasonal_width=None)

Generates time series forecasts based on the fitted model.

Parameters:

dataDataFrame, optional

Index and exogenous variables for forecast. For ARIMAX only.

Defaults to None.

keystr, optional

The timestamp column of data. The data type of the key column should be INTEGER, TIMESTAMP, DATE, or SECONDDATE. For ARIMAX only.

In massive mode, defaults to the first non-group key column of data if the index columns of data are not provided. Otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.

group_keystr, optional

The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. This parameter is only valid when massive is True.

Defaults to the first column of data if the index columns of data are not provided. Otherwise, defaults to the first column of index columns.

group_paramsdict, optional

If massive mode is activated (massive is True in class instance initialization), input data is divided into different groups with different parameters applied.

An example with group_params is as follows:

Valid only when self.massive is True.

Defaults to None.

forecast_method{'formula_forecast', 'innovations_algorithm', 'truncation_algorithm'}, optional

Specify the forecast method.

'formula_forecast': forecast via formula.

'innovations_algorithm': apply innovations algorithm to forecast.

'truncation_algorithm': a forecast method much faster than the innovations algorithm when the AR representation of the ARIMA model can be truncated to finite order.

Defaults to 'innovations_algorithm' if, in class initialization, the parameter forecast_method is not set, or set as 'innovations_algorithm'; otherwise defaults to 'formula_forecast'.

forecast_lengthint, optional

Number of points to forecast. Valid only when data is None.

In ARIMAX, the forecast length is the same as the length of the input predict data.

Defaults to None.

allow_new_indexbool, optional

Whether to recalculate and output the index column of the forecast result based on the type of the fitting data's index column.

True: The index column in the forecast result will be recalculated to match the type and sequence of the fitting data's index column.
False: The forecast result will output the original result from HANA PAL, which may use an integer index even if the fitting data's index column is a timestamp.

Defaults to False.

show_explainerbool, optional

Indicates whether to invoke the ARIMA with explanations function in the predict. Only valid when background_size is set when initializing an ARIMA instance.

If True, the contributions of trend, seasonal, transitory, irregular, and exogenous are shown in an attribute called explainer_ of the ARIMA / auto ARIMA instance.

Defaults to False.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use. Valid only when show_explainer is True.

Defaults to -1.

top_k_attributionsint, optional

Specifies the number of attributes with the largest contribution that will be output. 0-contributed attributes will not be output. Valid only when show_explainer is True.

Defaults to 10.

trend_modfloat, optional

The real AR roots with inverse modulus larger than this parameter will be integrated into the trend component. Valid only when show_explainer is True. Cannot be smaller than 0.

Defaults to 0.4.

trend_widthfloat, optional

Specifies the bandwidth of the spectrum of the trend component in units of rad. Valid only when show_explainer is True. Cannot be smaller than 0.

Defaults to 0.035.

seasonal_widthfloat, optional

Specifies the bandwidth of the spectrum of the seasonal component in units of rad. Valid only when show_explainer is True. Cannot be smaller than 0.

Defaults to 0.035.

Returns:

DataFrame 1: Forecasted values.
DataFrame 2 (optional): The explanations with decomposition of trend, seasonal, transitory, irregular, and reason code of exogenous variables. Only valid if show_explainer is True.
DataFrame 3 (optional): Error message. Only valid if massive is True.

get_permutation_importance(data, model=None, key=None, endog=None, exog=None, repeat_time=None, random_state=None, thread_ratio=None, partition_ratio=None, regressor_top_k=None, accuracy_measure=None, ignore_zero=None): Please see Permutation Feature Importance for Time Series for details.

set_conn(connection_context)

Set connection context for an ARIMA instance.

Parameters:

connection_contextConnectionContext: The connection to the SAP HANA system.

Returns:

None.

Inherited Methods from PALBase

Besides those methods mentioned above, the ARIMA class also inherits methods from PALBase class, please refer to PAL Base for more details.