VectorARIMA

class hana_ml.algorithms.pal.tsa.vector_arima.VectorARIMA(order=None, seasonal_order=None, model_type=None, search_method=None, lag_num=None, max_p=None, max_q=None, max_seasonal_p=None, max_seasonal_q=None, max_lag_num=None, init_guess=None, information_criterion=None, include_mean=None, max_iter=None, finite_diff_accuracy=None, displacement=None, ftol=None, gtol=None, calculate_hessian=None, calculate_irf=None, irf_lags=None, alpha=None, output_fitted=None, thread_ratio=None)

The vector autoregressive moving average models (VARMA) is a vector form of autoregressive integrated moving average (ARIMA) that can be used to examine the relationships among several variables in multivariate time series analysis, comparing to ARIMA which is used in univariate time series.

Parameters:
order(p, d, q), tuple of int, optional

Indicates the order (p, d, q).

  • p: value of the auto regression order. -1 indicates auto and >=0 is user-defined.

  • d: value of the differentiation order.

  • q: value of the moving average order. -1 indicates auto and >=0 is user-defined.

Defaults to (-1, 0, -1).

seasonal_order(P, D, Q, s), tuple of int, optional

Indicates the seasonal order (P, D, Q, s).

  • P: value of the auto regression order for the seasonal part. -1 indicates auto and >=0 is user-defined.

  • D: value of the differentiation order for the seasonal part.

  • Q: value of the moving average order for the seasonal part. -1 indicates auto and >=0 is user-defined.

  • s: value of the seasonal period. -1 indicates auto and >=0 is user-defined.

Defaults to (-1, 0, -1, 0).

model_type{'VAR', 'VMA', 'VARMA'}, optional

The model type.

Defaults to 'VARMA'.

search_method{'eccm', 'grid_search'}, optional

Specifies the orders of the model. 'eccm' is valid only when seasonal period is less than 1.

Defaults to 'grid_search'.

lag_numint, optional

The lag number of explanatory variables. Valid only when model_type is 'VAR'.

Defaults to 4.

max_pint, optional

The maximum value of vector AR order p.

Defaults to 6 if model_type is 'VAR' or if model_type is 'VARMA' and search_method is 'eccm'.

Defaults to 2 if model_type is 'VARMA' and search_method is 'grid_search'.

max_qint, optional

The maximum value of vector MA order q.

Defaults to 8 if model_type is 'VMA'.

Defaults to 5 if model_type is 'VARMA' and search_method is 'eccm'.

Defaults to 2 if model_type is 'VARMA' and search_method is 'grid_search'.

max_seasonal_pint, optional

The maximum value of seasonal vector AR order P.

Defaults to 3 if model_type is 'VAR'.

Defaults to 1 if model_type is 'VARMA' and search_method is 'grid_search'.

max_seasonal_qint, optional

The maximum value of seasonal vector MA order Q.

Defaults to 1.

max_lag_numint, optional

The maximum lag number of explanatory variables. Valid only when model_type is 'VAR'.

Defaults to 4.

init_guess{'ARMA', 'VAR'}, optional

The model used as initial estimation for VARMA. Valid only for VARMA.

Defaults to 'VAR'.

information_criterion{'AIC', 'BIC'}, optional

Information criteria for order specification.

Defaults to 'AIC'.

include_meanbool, optional

ARIMA model includes a constant part if True.

Valid only when d + D <= 1.

Defaults to True if d + D = 0 else False.

max_iterint, optional

Maximum number of iterations of L-BFGS-B optimizer. Valid only for VMA and VARMA.

Defaults to 200.

finite_diff_accuracyint, optional

Polynomial order of finite difference.

Approximate the gradient of objective function with finite difference.

The valid range is from 1 to 4.

Defaults to 1.

displacementfloat, optional

The step length for finite-difference method.

Valid only for VMA and VARMA.

Defaults to 2.2e-6.

ftolfloat, optional

Tolerance for objective convergence test.

Valid only for VMA and VARMA.

Defaults to 1e-5.

gtolfloat, optional

Tolerance for gradient convergence test.

Valid only for VMA and VARMA.

Defaults to 1e-5.

calculate_hessianbool, optional

Specifies whether to calculate the Hessian matrix.

VMA and VARMA will output standard error of parameter estimates only when calculate_hessian is True.

Defaults to False.

calculate_irfbool, optional

Specifies whether to calculate impulse response function.

Defaults to False.

irf_lagsint, optional

The number of lags of the IRF to be calculated.

Valid only when calculate_irf is True.

Defaults to 8.

alphafloat, optional

Type-I error used in the Ljung-Box tests and eccm.

Defaults to 0.05.

output_fittedbool, optional

Output fitted result and residuals if True.

Defaults to True.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to -1.

Examples

Input DataFrame df:

>>> df.collect()
   TIMESTAMP   Y1        X       Y2
0          1  9.8      6.4      8.2
...
38        39  7.2      5.4      5.5
39        40  7.6      5.5      5.8

Create an VectorARIMA instance:

>>> varima = VectorARIMA(model_type='VAR', calculate_irf=True)

Perform fit():

>>> varima.fit(data=df, endog=['Y1', 'Y2'], exog='X')

Output:

>>> varima.model_.head(5).collect()
   CONTENT_INDEX                                      CONTENT_VALUE
0              0                                    {"model":"VAR"}
1              1                                 {"exogCols":["X"]}
2              2                          {"endogCols":["Y1","Y2"]}
3              3  {"D":0,"P":0,"c":1,"d":0,"k":2,"m":2,"nT":40,"...
4              4                        {"AIC":-6.6759375491341144}
>>> varima.fitted_.head(3).collect()
  NAMECOL    IDX   FITTING      RESIDUAL
0      Y1      1       NaN           NaN
1      Y1      2       NaN           NaN
2      Y1      3  9.622092      0.177908
>>> varima.irf_.head(3).collect()
  COL1    COL2    IDX   RESPONSE
0   Y1      X       0   0.243569
1   Y1      X       1   0.139749
2   Y1      X       2  -0.351429

Perform predict():

>>> pred_df.collect()
  TIMESTAMP           X
0        41         5.2
...
4        45         5.7
>>> result_dict, result_all = varima.predict(pred_df)

Output:

>>> result_dict['Y1'].head(3).collect()
   IDX  FORECAST          SE        LO95        HI95
0   41  7.577883    0.172352    7.240072    7.915694
...
4   45  6.773185    0.347997    6.091110    7.455259
>>> result_dict['Y2'].head(3).collect()
   IDX  FORECAST          SE        LO95        HI95
0   41  5.822953    0.171752    5.486320    6.159586
...
4   45  5.141598    0.298299    4.556933    5.726263
>>> result_all.head(6).collect()
   COLNAME     IDX  FORECAST          SE        LO95        HI95
0       Y1      41  7.577883    0.172352    7.240072    7.915694
1       Y1      42  7.202759    0.233421    6.745254    7.660264
...
9       Y2      45  5.141598    0.298299    4.556933    5.726263
Attributes:
model_DataFrame

Model content.

fitted_DateFrame

Fitted values and residuals.

irf_DataFrame

Impulse response function.

Methods

fit(data[, key, endog, exog])

Fit the model to the training dataset.

get_model_metrics()

Get the model metrics.

get_score_metrics()

Get the score metrics.

predict([data, key, forecast_length, ...])

Generates time series forecasts based on the fitted model.

set_conn(connection_context)

Set connection context for a VectorARIMA instance.

fit(data, key=None, endog=None, exog=None)

Fit the model to the training dataset.

Parameters:
dataDataFrame

DataFrame includes key, endogenous variables and may contain exogenous variables.

keystr, optional

The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE, or SECONDDATE.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index of data.

endoga list of str, optional

The endogenous variables, i.e. time series. The type of endog column can be INTEGER, DOUBLE or DECIMAL(p,s).

Defaults to all non-key and non-exog columns of data if not provided.

exogstr or a list of str, optional

An optional array of exogenous variables. The type of exog column can be INTEGER, DOUBLE or DECIMAL(p,s).

Defaults to None.

Returns:
A fitted object of class "VectorARIMA".
set_conn(connection_context)

Set connection context for a VectorARIMA instance.

Parameters:
connection_contextConnectionContext

The connection to the SAP HANA system.

Returns:
None.
predict(data=None, key=None, forecast_length=None, allow_new_index=False)

Generates time series forecasts based on the fitted model.

Parameters:
dataDataFrame, optional

Index and exogenous variables for forecast. The structure is as follows:

  • First column: Index (ID), int.

  • Other columns : exogenous variables, with type INTEGER, DOUBLE or DECIMAL(p,s).

Defaults to None.

keystr, optional

The timestamp column of data. The type of key column is int.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

forecast_lengthint, optional

Number of points to forecast. Valid only when the first input table is absent.

Defaults to None.

allow_new_indexbool, optional

Indicates whether a new index column is allowed in the result.

  • True: return the result with new integer or timestamp index column.

  • False: return the result with index column starting from 0.

Defaults to False.

Returns:
Dict of DataFrames

Collection of forecasted value. Key is the column name. Forecasted values, structured as follows:

  • ID: type INTEGER, timestamp.

  • FORECAST: type DOUBLE, forecast value.

  • SE: type DOUBLE, standard error.

  • LO95: type DOUBLE, low 95% value.

  • HI95: type DOUBLE, high 95% value.

DataFrame

The aggregated forecasted values. Forecasted values, structured as follows:

  • COLNAME: type NVARCHAR(5000), name of endogs.

  • ID: type INTEGER, timestamp.

  • FORECAST: type DOUBLE, forecast value.

  • SE: type DOUBLE, standard error.

  • LO95: type DOUBLE, low 95% value.

  • HI95: type DOUBLE, high 95% value.

get_model_metrics()

Get the model metrics.

Returns:
DataFrame

The model metrics.

get_score_metrics()

Get the score metrics.

Returns:
DataFrame

The score metrics.

Inherited Methods from PALBase

Besides those methods mentioned above, the VectorARIMA class also inherits methods from PALBase class, please refer to PAL Base for more details.