predict.AutoARIMA.Rd
Similar to other predict methods, this function predicts fitted values from a fitted "hanaml.AutoARIMA" object.
# S3 method for AutoARIMA
predict(
model,
data = NULL,
key = NULL,
forecast.method = NULL,
forecast.length = NULL,
show.explainer = FALSE,
thread.ratio = NULL,
top.k.attributions = NULL,
trend.mod = NULL,
trend.width = NULL,
seasonal.width = NULL,
group.key = NULL,
group.params = NULL,
...
)
S3
methods
R6Class object
A "hanaml.AutoARIMA" object for prediction.
DataFrame, optional
Includes the ID column and external Data (exogeneous variables) for prediction.
Defaults to NULL.
character, optional
Name of the key in the data.
Defaults to NULL and if data is not NULL and key is not provided, defaults to the first column.
c("formula.forecast", "innovations.algorithm"), optional
Store information for the subsequent forecast method.
"formula.forecast":
compute future series via formula.
"innovations.algorithm":
apply innovations algorithm to compute future
series, which requires more original information to be stored
Defaults to "innovations.algorithm".
integer, optional
Number of points to forecast.
Defaults to 1.
logical, optional
Indicate whether to invoke the hanaml.ARIMA with explainations function in the predict.
Only valid when background.size is set when initializing an hanaml.ARIMA instance.
If TRUE, the contributions of trend, seasonal, transitory irregular and exogenous are
shown in a attribute called explainer of hanaml.ARIMA and hanaml.AutoARIMA instance.
Defaults to FALSE.
double, optional
Controls the proportion of available threads to use.
The ratio of available threads.
0: single thread.
0~1: percentage.
Others: heuristically determined.
Defaults to -1. Valid only when show.explainer is TRUE.
integer, optional
Specifies the number of attributes with the largest contribution that will be output.
0-contributed attributes will not be output.
Valid only when show.explainer is TRUE.
Defaults to 10.
double, optional
The real AR roots with inverse modulus larger than trend.mod
will be integrated into trend component.
Valid only when show.explainer is TRUE.
Cannot be smaller than 0.
Defaults to 0.4.
double, optional
Specifies the bandwidth of spectrum of trend component in unit of rad.
Valid only when show.explainer is TRUE. Cannot be smaller than 0.
Defaults to 0.035.
double, optional
Specifies the bandwidth of spectrum of seasonal component in unit of rad.
Valid only when show.explainer is TRUE. Cannot be smaller than 0.
Defaults to 0.035.
character, optional
The column of group key. The data type can be INT or NVARCHAR/VARCHAR.
If data type is INT, only parameters set in the group.params are valid.
This parameter is only valid when massive is TRUE.
Defaults to the first column of data if group.key is not provided.
list, optional
If the massive mode is activated (massive = TRUE),
input data shall be divided into different groups with different parameters applied.
An example is as follows:
> mautoarima <- hanaml.AutoARIMA(data=df,
massive=TRUE,
background.size=5,
group.key="GROUP_ID",
group.params=list("Group_1"=list('allow.linear'=FALSE)))
> mres <- predict(model=mautoarima,
data=pred.df,
group.key="GROUP_ID",
key="TIMESTAMP",
show.explainer=TRUE,
group.params = list("GROUP_A"=list("forecast.method"="innovations.algorithm"),
"GROUP_B"=list("forecast.method"="innovations.algorithm")))
Reserved parameter.
Predicted values are returned as a DataFrame, structured as follows:
ID
: with same name and type the ID column of data
.
FORECAST
: type DOUBLE, representing predicted values.
SE
: type DOUBLE, standard error.
LO80
: type DOUBLE, low 80% values.
HI80
: type DOUBLE, high 80% values.
LO95
: type DOUBLE, low 95% values.
HI95
: type DOUBLE, high 95% values.
Note that if show.explainer=TRUE, the attribute explainer is generated.
When massive=TRUE, an additional error message DataFrame is returned.
Simply getting the forecasting value is not enough for diving deeply into the model and data.
Understanding the reason behind why such predictions are made is also a crucial demand for users
to trust the predictions and make decisions.
Financial analyst may be curious about which latent stock affects the stock of interest the most,
relevant departments will be benefited a lot if they have some knowledge of what are the major factors
of varying weather, unemployment rates, electricity consumption and so on. In all those scenarios,
the importance of exogenous variables (referring to variables that are not affected by others) can
be quantified as the contributions they make.
Let us recap the formula of an ARIMAX model with p exogenous variables
\(X_t = (x_t^{(1)}, x_t^{(2)}, \ldots, x_t^{(p)})\):
$$\Phi(B)\phi(B^s)\widetilde{y_t}=\theta(B)\Theta(B^s)\epsilon_t+\sum_{i=1}^p\beta_i\tilde{x_t}^{(i)}$$,
where \(\widetilde{y_t}=(1-B)^d(1-B^s)^Dy_t, \tilde{x_t}^{(i)}=(1-B)^d(1-B^s)^Dx_t^{(i)}\) and \(\beta_i\)
is the corresponding regression coefficient.
As can be seen the from the above formula, there are two major parts that can be considered individually
in explaining the forecasted values -- ARIMA part and regressor part.
Interpreting the ARIMA Part
To interpret the ARIMA part, we have implemented a decomposition method derived from the Digital Signal Processing realm.
From the frequency domain, any time series has some components in low-frequency, high-frequency, and band-frequency
areas as well as some irregular ones. In particular, trend part is assumed to have low frequencies, and seasonal part to
have high frequencies. Instead of using Fast Fourier Transform(FFT), the auto-regressive and moving-average parts of an ARIMA model
also reveal those components inside a time series. Therefore, our algorithm decomposes the forecasted values into trend, seasonal,
transitory, and irregular parts directly from the trained model. If a specific component is marked "?" in the output DataFrame,
then it means the corresponding component cannot be decomposed by our setting.
Interpreting the Regressor Part
To interpret the regressor part, we adopted the famous LinearSHAP algorithm, which is able to generate the contribution of
each exogenous feature to the forecasted values given the regression coefficients and background data.
Key Relevant Parameters
background.size
: This parameter specifies the size for background data for LinearSHAP, which must be set non-zero
in hanaml.ARIMA in order to generate local interpretation.
show.explainer
: This parameter serves as a trigger for interpreting ARIMA model, set it as TRUE if
interpretability is desired.
top.k.attributions
: This parameter specifies the number of attributions with highest contributions to
the forecast values to output. Note that zero-contributed attributes shall not be displayed.
trend.mod
: The real auto-regressive roots with inverse modulus larger than
the value specified in trend.mod
will be integrated into trend component.
trend.width
: This parameter specifies the bandwidth of spectrum of trend component in unit of rad.
seasonal.width
: This parameter specifies the bandwidth of spectrum of seasonal component in unit of rad.
Call the function and obtain the result:
> predict(model=autoarima, forecast.length=5)
TIMESTAMP FORECAST SE LO80 HI80 LO95 HI95
1 0 -15.544832 3.298697 -19.772283 -11.31738 -22.0101587 -9.079505
2 1 35.587390 3.404891 31.223846 39.95094 28.9139269 42.260854
3 2 56.498532 3.411723 52.126231 60.87083 49.8116773 63.185386
4 3 7.086176 3.412170 2.713303 11.45905 0.3984467 13.773906
5 4 -16.266996 3.412250 -20.639972 -11.89402 -22.9548838 -9.579108
If you want to see the decomposed result of predict result, you could set background.size when initializing an instance and set show.explainer = TRUE in the predict():
> autoarm <- hanaml.AutoARIMA(data=data,
background.size=10)
Invoke the predict:
> result <- predict(model=autoarm,
forecast.method="innovations.algorithm",
forecast.length=3,
show.explainer=TRUE)
Show the explainer of a hanaml.AutoARIMA instance:
> arm$explainer$Collect()
TIMESTAMP TREND SEASONAL TRANSITORY IRREGULAR EXOGENOUS
1 0 0.1452041 -0.9329735 0.9274021 -24.93706
2 1 4.6110870 0.3368592 12.9455897 25.75553
3 2 6.6124186 0.8155893 17.1545481 47.95495