VectorARIMA
- class hana_ml.algorithms.pal.tsa.vector_arima.VectorARIMA(order=None, seasonal_order=None, model_type=None, search_method=None, lag_num=None, max_p=None, max_q=None, max_seasonal_p=None, max_seasonal_q=None, max_lag_num=None, init_guess=None, information_criterion=None, include_mean=None, max_iter=None, finite_diff_accuracy=None, displacement=None, ftol=None, gtol=None, calculate_hessian=None, calculate_irf=None, irf_lags=None, alpha=None, output_fitted=None, thread_ratio=None)
Vector Autoregressive Integrated Moving Average ARIMA(p, d, q) model.
- Parameters
- order(p, d, q), tuple of int, optional
Indicate the order (p, d, q).
p: value of the auto regression order. -1 indicates auto and >=0 is user-defined.
d: value of the differentiation order.
q: value of the moving average order. -1 indicates auto and >=0 is user-defined.
Defaults to (-1, 0, -1).
- seasonal_order(P, D, Q, s), tuple of int, optional
Indicate the seasonal order (P, D, Q, s).
P: value of the auto regression order for the seasonal part. -1 indicates auto and >=0 is user-defined.
D: value of the differentiation order for the seasonal part.
Q: value of the moving average order for the seasonal part. -1 indicates auto and >=0 is user-defined.
s: value of the seasonal period. -1 indicates auto and >=0 is user-defined.
Defaults to (-1, 0, -1, 0).
- model_type{'VAR', 'VMA', 'VARMA'}, optional
The model type.
Defaults to 'VARMA'.
- search_method{'eccm', 'grid_search'}, optional
Specifies the orders of the model. 'eccm' is valid only when seasonal period is less than 1.
Defaults to 'grid_search'.
- lag_numint, optional
The lag number of explanatory variables. Valid only when
model_type
is 'VAR'.Defaults to 4.
- max_pint, optional
The maximum value of vector AR order p.
Defaults to 6 if
model_type
is 'VAR' or ifmodel_type
is 'VARMA' andsearch_method
is 'eccm'.Defaults to 2 if
model_type
is 'VARMA' andsearch_method
is 'grid_search'.- max_qint, optional
The maximum value of vector MA order q.
Defaults to 8 if
model_type
is 'VMA'.Defaults to 5 if
model_type
is 'VARMA' andsearch_method
is 'eccm'.Defaults to 2 if
model_type
is 'VARMA' andsearch_method
is 'grid_search'.- max_seasonal_pint, optional
The maximum value of seasonal vector AR order P.
Defaults to 3 if
model_type
is 'VAR'.Defaults to 1 if
model_type
is 'VARMA' andsearch_method
is 'grid_search'.- max_seasonal_qint, optional
The maximum value of seasonal vector MA order Q.
Defaults to 1.
- max_lag_numint, optional
The maximum lag number of explanatory variables. Valid only when
model_type
is 'VAR'.Defaults to 4.
- init_guess{'ARMA', 'VAR'}, optional
The model used as initial estimation for VARMA. Valid only for VARMA.
Defaults to 'VAR'.
- information_criterion{'AIC', 'BIC'}, optional
Information criteria for order specification.
Defaults to 'AIC'.
- include_meanbool, optional
ARIMA model includes a constant part if True.
Valid only when d + D <= 1.
Defaults to True if d + D = 0 else False.
- max_iterint, optional
Maximum number of iterations of L-BFGS-B optimizer. Valid only for VMA and VARMA.
Defaults to 200.
- finite_diff_accuracyint, optional
Polynomial order of finite difference.
Approximate the gradient of objective function with finite difference.
The valid range is from 1 to 4.
Defaults to 1.
- displacementfloat, optional
The step length for finite-difference method.
Valid only for VMA and VARMA.
Defaults to 2.2e-6.
- ftolfloat, optional
Tolerance for objective convergence test.
Valid only for VMA and VARMA.
Defaults to 1e-5.
- gtolfloat, optional
Tolerance for gradient convergence test.
Valid only for VMA and VARMA.
Defaults to 1e-5.
- calculate_hessianbool, optional
Specifies whether to calculate the Hessian matrix.
VMA and VARMA will output standard error of parameter estimates only when calculate_hessian is True.
Defaults to False.
- calculate_irfbool, optional
Specifies whether to calculate impulse response function.
Defaults to False.
- irf_lagsint, optional
The number of lags of the IRF to be calculated.
Valid only when calculate_irf is True.
Defaults to 8.
- alphafloat, optional
Type-I error used in the Ljung-Box tests and eccm.
Defaults to 0.05.
- output_fittedbool, optional
Output fitted result and residuals if True.
Defaults to True.
- thread_ratiofloat, optional
Controls the proportion of available threads to use.
- The ratio of available threads.
0: single thread
0~1: percentage
Others: heuristically determined
Defaults to -1.
Examples
Vector ARIMA example:
Input dataframe df:
>>> df.collect() TIMESTAMP Y1 X Y2 0 1 9.8 6.4 8.2 1 2 9.7 6.4 8.1 2 3 9.8 6.3 8 3 4 9.7 6.2 7.9 4 5 9.6 6.3 7.8 5 6 9.6 6.8 7.6 6 7 9.6 6.8 7.5 7 8 9 6.8 7.5 8 9 9.2 6.8 7.4 9 10 9.2 6.7 7.5 10 11 9.1 6.6 7.6 11 12 9 6.6 7.5 12 13 8.8 6 7.2 13 14 8.8 6 7.7 14 15 8.7 5.9 7 15 16 8.3 5.8 6.5 16 17 8.2 5.9 6.4 17 18 8.2 6.3 6.3 18 19 8.2 6.3 6.1 10 20 8.4 6.4 6 20 21 8.1 6.4 6.1 21 22 7.8 6.5 6 22 23 7.7 6.5 5.9 23 24 7.5 6.3 5.9 24 25 7.2 6.5 5.7 25 26 7.2 6.4 5.8 26 27 7 6.3 5.8 27 28 7 6 5.5 28 29 6.9 6.2 5.4 29 30 7 5.9 5.4 30 31 7.1 6 5.3 31 32 7.4 6 5.4 32 33 6.9 5.8 5.5 33 34 6.8 5.8 5.4 34 35 7 5.6 5.4 35 36 7.1 5.6 5.4 36 37 7 5.3 5.7 37 38 7 5.3 5.6 38 39 7.2 5.4 5.5 39 40 7.6 5.5 5.8
Create an VectorARIMA instance:
>>> varima = VectorARIMA(model_type='VAR', calculate_irf=True)
Perform fit on the given data:
>>> varima.fit(data=df, endog=['Y1', 'Y2'], exog='X')
Expected output:
>>> varima.model_.head(5).collect() CONTENT_INDEX CONTENT_VALUE 0 0 {"model":"VAR"} 1 1 {"exogCols":["X"]} 2 2 {"endogCols":["Y1","Y2"]} 3 3 {"D":0,"P":0,"c":1,"d":0,"k":2,"m":2,"nT":40,"... 4 4 {"AIC":-6.6759375491341144}
>>> varima.fitted_.head(3).collect() NAMECOL IDX FITTING RESIDUAL 0 Y1 1 NaN NaN 1 Y1 2 NaN NaN 2 Y1 3 9.622092 0.177908
>>> varima.irf_.head(3).collect() COL1 COL2 IDX RESPONSE 0 Y1 X 0 0.243569 1 Y1 X 1 0.139749 2 Y1 X 2 -0.351429
Perform predict on the model:
>>> pred_df.collect() TIMESTAMP X 0 41 5.2 1 42 5.2 2 43 5.2 3 44 5.2 4 45 5.7 >>> result_dict, result_all = varima.predict(pred_df)
Expected output:
>>> result_dict['Y1'].head(3).collect() IDX FORECAST SE LO95 HI95 0 41 7.577883 0.172352 7.240072 7.915694 1 42 7.202759 0.233421 6.745254 7.660264 2 43 7.074507 0.279358 6.526966 7.622049 3 44 6.856650 0.316641 6.236034 7.477265 4 45 6.773185 0.347997 6.091110 7.455259
>>> result_dict['Y2'].head(3).collect() IDX FORECAST SE LO95 HI95 0 41 5.822953 0.171752 5.486320 6.159586 1 42 5.837502 0.216817 5.412541 6.262464 2 43 5.577920 0.249243 5.089403 6.066437 3 44 5.395543 0.275731 4.855109 5.935976 4 45 5.141598 0.298299 4.556933 5.726263
>>> result_all.head(6).collect() COLNAME IDX FORECAST SE LO95 HI95 0 Y1 41 7.577883 0.172352 7.240072 7.915694 1 Y1 42 7.202759 0.233421 6.745254 7.660264 2 Y1 43 7.074507 0.279358 6.526966 7.622049 3 Y1 44 6.856650 0.316641 6.236034 7.477265 4 Y1 45 6.773185 0.347997 6.091110 7.455259 5 Y2 41 5.822953 0.171752 5.486320 6.159586 6 Y2 42 5.837502 0.216817 5.412541 6.262464 7 Y2 43 5.577920 0.249243 5.089403 6.066437 8 Y2 44 5.395543 0.275731 4.855109 5.935976 9 Y2 45 5.141598 0.298299 4.556933 5.726263
- Attributes
- model_DataFrame
Model content.
- fitted_DateFrame
Fitted values and residuals.
- irf_DataFrame
Impulse response function.
Methods
fit
(data[, key, endog, exog])Generates ARIMA models with given parameters.
predict
([data, key, forecast_length, ...])Makes time series forecast based on the estimated ARIMA model.
set_conn
(connection_context)Set connection context for ARIMA and AutoARIMA instance.
- fit(data, key=None, endog=None, exog=None)
Generates ARIMA models with given parameters.
- Parameters
- dataDataFrame
DataFrame includes key, endogenous variables and may contain exogenous variables.
- keystr, optional
The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE, or SECONDDATE.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index of data.
- endoglist of str, optional
The endogenous variables, i.e. time series. The type of endog column can be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to all non-key and non-exog columns of data if not provided.
- exoglist of str, optional
An optional array of exogenous variables. The type of exog column can be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to None.
- Returns
- A fitted object of class "VectorARIMA".
- set_conn(connection_context)
Set connection context for ARIMA and AutoARIMA instance.
- Parameters
- connection_contextConnectionContext
The connection to the SAP HANA system.
- Returns
- None.
- predict(data=None, key=None, forecast_length=None, allow_new_index=False)
Makes time series forecast based on the estimated ARIMA model.
- Parameters
- dataDataFrame, optional
Index and exogenous variables for forecast. The structure is as follows:
First column: Index (ID), int.
Other columns : exogenous variables, with type INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to None.
- keystr, optional
The timestamp column of data. The type of key column is int.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- forecast_lengthint, optional
Number of points to forecast. Valid only when the first input table is absent.
Defaults to None.
- allow_new_indexbool, optional
Indicate whether a new index column is allowed in the result.
True: return the result with new integer or timestamp index column.
False: return the result with index column starting from 0.
Defaults to False.
- Returns
- Dict of DataFrames
Collection of forecasted value. Key is the column name. Forecasted values, structured as follows:
ID, type INTEGER, timestamp.
FORECAST, type DOUBLE, forecast value.
SE, type DOUBLE, standard error.
LO95, type DOUBLE, low 95% value.
HI95, type DOUBLE, high 95% value.
- DataFrame
The aggregated forecasted values. Forecasted values, structured as follows:
COLNAME, type NVARCHAR(5000), name of endogs.
ID, type INTEGER, timestamp.
FORECAST, type DOUBLE, forecast value.
SE, type DOUBLE, standard error.
LO95, type DOUBLE, low 95% value.
HI95, type DOUBLE, high 95% value.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.