VectorARIMA
- class hana_ml.algorithms.pal.tsa.vector_arima.VectorARIMA(order=None, seasonal_order=None, model_type=None, search_method=None, lag_num=None, max_p=None, max_q=None, max_seasonal_p=None, max_seasonal_q=None, max_lag_num=None, init_guess=None, information_criterion=None, include_mean=None, max_iter=None, finite_diff_accuracy=None, displacement=None, ftol=None, gtol=None, calculate_hessian=None, calculate_irf=None, irf_lags=None, alpha=None, output_fitted=None, thread_ratio=None)
The vector autoregressive moving average models (VARMA) is a vector form of autoregressive integrated moving average (ARIMA) that can be used to examine the relationships among several variables in multivariate time series analysis, comparing to ARIMA which is used in univariate time series.
- Parameters:
- order(p, d, q), tuple of int, optional
Indicates the order (p, d, q).
p: value of the auto regression order. -1 indicates auto and >=0 is user-defined.
d: value of the differentiation order.
q: value of the moving average order. -1 indicates auto and >=0 is user-defined.
Defaults to (-1, 0, -1).
- seasonal_order(P, D, Q, s), tuple of int, optional
Indicates the seasonal order (P, D, Q, s).
P: value of the auto regression order for the seasonal part. -1 indicates auto and >=0 is user-defined.
D: value of the differentiation order for the seasonal part.
Q: value of the moving average order for the seasonal part. -1 indicates auto and >=0 is user-defined.
s: value of the seasonal period. -1 indicates auto and >=0 is user-defined.
Defaults to (-1, 0, -1, 0).
- model_type{'VAR', 'VMA', 'VARMA'}, optional
The model type.
Defaults to 'VARMA'.
- search_method{'eccm', 'grid_search'}, optional
Specifies the orders of the model. 'eccm' is valid only when seasonal period is less than 1.
Defaults to 'grid_search'.
- lag_numint, optional
The lag number of explanatory variables. Valid only when
model_type
is 'VAR'.Defaults to 4.
- max_pint, optional
The maximum value of vector AR order p.
Defaults to 6 if
model_type
is 'VAR' or ifmodel_type
is 'VARMA' andsearch_method
is 'eccm'.Defaults to 2 if
model_type
is 'VARMA' andsearch_method
is 'grid_search'.- max_qint, optional
The maximum value of vector MA order q.
Defaults to 8 if
model_type
is 'VMA'.Defaults to 5 if
model_type
is 'VARMA' andsearch_method
is 'eccm'.Defaults to 2 if
model_type
is 'VARMA' andsearch_method
is 'grid_search'.- max_seasonal_pint, optional
The maximum value of seasonal vector AR order P.
Defaults to 3 if
model_type
is 'VAR'.Defaults to 1 if
model_type
is 'VARMA' andsearch_method
is 'grid_search'.- max_seasonal_qint, optional
The maximum value of seasonal vector MA order Q.
Defaults to 1.
- max_lag_numint, optional
The maximum lag number of explanatory variables. Valid only when
model_type
is 'VAR'.Defaults to 4.
- init_guess{'ARMA', 'VAR'}, optional
The model used as initial estimation for VARMA. Valid only for VARMA.
Defaults to 'VAR'.
- information_criterion{'AIC', 'BIC'}, optional
Information criteria for order specification.
Defaults to 'AIC'.
- include_meanbool, optional
ARIMA model includes a constant part if True.
Valid only when d + D <= 1.
Defaults to True if d + D = 0 else False.
- max_iterint, optional
Maximum number of iterations of L-BFGS-B optimizer. Valid only for VMA and VARMA.
Defaults to 200.
- finite_diff_accuracyint, optional
Polynomial order of finite difference.
Approximate the gradient of objective function with finite difference.
The valid range is from 1 to 4.
Defaults to 1.
- displacementfloat, optional
The step length for finite-difference method.
Valid only for VMA and VARMA.
Defaults to 2.2e-6.
- ftolfloat, optional
Tolerance for objective convergence test.
Valid only for VMA and VARMA.
Defaults to 1e-5.
- gtolfloat, optional
Tolerance for gradient convergence test.
Valid only for VMA and VARMA.
Defaults to 1e-5.
- calculate_hessianbool, optional
Specifies whether to calculate the Hessian matrix.
VMA and VARMA will output standard error of parameter estimates only when calculate_hessian is True.
Defaults to False.
- calculate_irfbool, optional
Specifies whether to calculate impulse response function.
Defaults to False.
- irf_lagsint, optional
The number of lags of the IRF to be calculated.
Valid only when calculate_irf is True.
Defaults to 8.
- alphafloat, optional
Type-I error used in the Ljung-Box tests and eccm.
Defaults to 0.05.
- output_fittedbool, optional
Output fitted result and residuals if True.
Defaults to True.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to -1.
Examples
Input DataFrame df:
>>> df.collect() TIMESTAMP Y1 X Y2 0 1 9.8 6.4 8.2 ... 38 39 7.2 5.4 5.5 39 40 7.6 5.5 5.8
Create an VectorARIMA instance:
>>> varima = VectorARIMA(model_type='VAR', calculate_irf=True)
Perform fit():
>>> varima.fit(data=df, endog=['Y1', 'Y2'], exog='X')
Output:
>>> varima.model_.head(5).collect() CONTENT_INDEX CONTENT_VALUE 0 0 {"model":"VAR"} 1 1 {"exogCols":["X"]} 2 2 {"endogCols":["Y1","Y2"]} 3 3 {"D":0,"P":0,"c":1,"d":0,"k":2,"m":2,"nT":40,"... 4 4 {"AIC":-6.6759375491341144}
>>> varima.fitted_.head(3).collect() NAMECOL IDX FITTING RESIDUAL 0 Y1 1 NaN NaN 1 Y1 2 NaN NaN 2 Y1 3 9.622092 0.177908
>>> varima.irf_.head(3).collect() COL1 COL2 IDX RESPONSE 0 Y1 X 0 0.243569 1 Y1 X 1 0.139749 2 Y1 X 2 -0.351429
Perform predict():
>>> pred_df.collect() TIMESTAMP X 0 41 5.2 ... 4 45 5.7 >>> result_dict, result_all = varima.predict(pred_df)
Output:
>>> result_dict['Y1'].head(3).collect() IDX FORECAST SE LO95 HI95 0 41 7.577883 0.172352 7.240072 7.915694 ... 4 45 6.773185 0.347997 6.091110 7.455259
>>> result_dict['Y2'].head(3).collect() IDX FORECAST SE LO95 HI95 0 41 5.822953 0.171752 5.486320 6.159586 ... 4 45 5.141598 0.298299 4.556933 5.726263
>>> result_all.head(6).collect() COLNAME IDX FORECAST SE LO95 HI95 0 Y1 41 7.577883 0.172352 7.240072 7.915694 1 Y1 42 7.202759 0.233421 6.745254 7.660264 ... 9 Y2 45 5.141598 0.298299 4.556933 5.726263
- Attributes:
- model_DataFrame
Model content.
- fitted_DateFrame
Fitted values and residuals.
- irf_DataFrame
Impulse response function.
Methods
fit
(data[, key, endog, exog])Fit the model to the training dataset.
Get the model metrics.
Get the score metrics.
predict
([data, key, forecast_length, ...])Generates time series forecasts based on the fitted model.
set_conn
(connection_context)Set connection context for a VectorARIMA instance.
- fit(data, key=None, endog=None, exog=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
DataFrame includes key, endogenous variables and may contain exogenous variables.
- keystr, optional
The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE, or SECONDDATE.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index of data.
- endoga list of str, optional
The endogenous variables, i.e. time series. The type of endog column can be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to all non-key and non-exog columns of data if not provided.
- exogstr or a list of str, optional
An optional array of exogenous variables. The type of exog column can be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to None.
- Returns:
- A fitted object of class "VectorARIMA".
- set_conn(connection_context)
Set connection context for a VectorARIMA instance.
- Parameters:
- connection_contextConnectionContext
The connection to the SAP HANA system.
- Returns:
- None.
- predict(data=None, key=None, forecast_length=None, allow_new_index=False)
Generates time series forecasts based on the fitted model.
- Parameters:
- dataDataFrame, optional
Index and exogenous variables for forecast. The structure is as follows:
First column: Index (ID), int.
Other columns : exogenous variables, with type INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to None.
- keystr, optional
The timestamp column of data. The type of key column is int.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- forecast_lengthint, optional
Number of points to forecast. Valid only when the first input table is absent.
Defaults to None.
- allow_new_indexbool, optional
Indicates whether a new index column is allowed in the result.
True: return the result with new integer or timestamp index column.
False: return the result with index column starting from 0.
Defaults to False.
- Returns:
- Dict of DataFrames
Collection of forecasted value. Key is the column name. Forecasted values, structured as follows:
ID: type INTEGER, timestamp.
FORECAST: type DOUBLE, forecast value.
SE: type DOUBLE, standard error.
LO95: type DOUBLE, low 95% value.
HI95: type DOUBLE, high 95% value.
- DataFrame
The aggregated forecasted values. Forecasted values, structured as follows:
COLNAME: type NVARCHAR(5000), name of endogs.
ID: type INTEGER, timestamp.
FORECAST: type DOUBLE, forecast value.
SE: type DOUBLE, standard error.
LO95: type DOUBLE, low 95% value.
HI95: type DOUBLE, high 95% value.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the VectorARIMA class also inherits methods from PALBase class, please refer to PAL Base for more details.