Similar to other predict methods, this function predicts fitted values from a fitted "hanaml.AutoARIMA" object.

# S3 method for AutoARIMA
predict(
  model,
  data = NULL,
  key = NULL,
  forecast.method = NULL,
  forecast.length = NULL,
  show.explainer = FALSE,
  thread.ratio = NULL,
  top.k.attributions = NULL,
  trend.mod = NULL,
  trend.width = NULL,
  seasonal.width = NULL,
  group.key = NULL,
  group.params = NULL,
  ...
)

Format

S3 methods

Arguments

model

R6Class object
A "hanaml.AutoARIMA" object for prediction.

data

DataFrame, optional
Includes the ID column and external Data (exogeneous variables) for prediction.
Defaults to NULL.

key

character, optional
Name of the key in the data.
Defaults to NULL and if data is not NULL and key is not provided, defaults to the first column.

forecast.method

c("formula.forecast", "innovations.algorithm"), optional
Store information for the subsequent forecast method.

  • "formula.forecast": compute future series via formula.

  • "innovations.algorithm": apply innovations algorithm to compute future series, which requires more original information to be stored

Defaults to "innovations.algorithm".

forecast.length

integer, optional
Number of points to forecast.
Defaults to 1.

show.explainer

logical, optional
Indicate whether to invoke the hanaml.ARIMA with explainations function in the predict. Only valid when background.size is set when initializing an hanaml.ARIMA instance. If TRUE, the contributions of trend, seasonal, transitory irregular and exogenous are shown in a attribute called explainer of hanaml.ARIMA and hanaml.AutoARIMA instance.
Defaults to FALSE.

thread.ratio

double, optional
Controls the proportion of available threads to use.
The ratio of available threads.

  • 0: single thread.

  • 0~1: percentage.

  • Others: heuristically determined.

Defaults to -1. Valid only when show.explainer is TRUE.

top.k.attributions

integer, optional
Specifies the number of attributes with the largest contribution that will be output. 0-contributed attributes will not be output.
Valid only when show.explainer is TRUE.
Defaults to 10.

trend.mod

double, optional
The real AR roots with inverse modulus larger than trend.mod will be integrated into trend component. Valid only when show.explainer is TRUE.
Cannot be smaller than 0.
Defaults to 0.4.

trend.width

double, optional
Specifies the bandwidth of spectrum of trend component in unit of rad. Valid only when show.explainer is TRUE. Cannot be smaller than 0.
Defaults to 0.035.

seasonal.width

double, optional
Specifies the bandwidth of spectrum of seasonal component in unit of rad. Valid only when show.explainer is TRUE. Cannot be smaller than 0.
Defaults to 0.035.

group.key

character, optional
The column of group key. The data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group.params are valid. This parameter is only valid when massive is TRUE.
Defaults to the first column of data if group.key is not provided.

group.params

list, optional
If the massive mode is activated (massive = TRUE), input data shall be divided into different groups with different parameters applied.
An example is as follows:


 > mautoarima <- hanaml.AutoARIMA(data=df,
                                  massive=TRUE,
                                  background.size=5,
                                  group.key="GROUP_ID",
                                  group.params=list("Group_1"=list('allow.linear'=FALSE)))
> mres <- predict(model=mautoarima,
                  data=pred.df,
                  group.key="GROUP_ID",
                  key="TIMESTAMP",
                  show.explainer=TRUE,
                  group.params = list("GROUP_A"=list("forecast.method"="innovations.algorithm"),
                                      "GROUP_B"=list("forecast.method"="innovations.algorithm")))
 
...

Reserved parameter.

Value

Predicted values are returned as a DataFrame, structured as follows:

  • ID: with same name and type the ID column of data.

  • FORECAST: type DOUBLE, representing predicted values.

  • SE: type DOUBLE, standard error.

  • LO80: type DOUBLE, low 80% values.

  • HI80: type DOUBLE, high 80% values.

  • LO95: type DOUBLE, low 95% values.

  • HI95: type DOUBLE, high 95% values.

Note that if show.explainer=TRUE, the attribute explainer is generated.
When massive=TRUE, an additional error message DataFrame is returned.

Explaining the Forecasts of ARIMA

Simply getting the forecasting value is not enough for diving deeply into the model and data. Understanding the reason behind why such predictions are made is also a crucial demand for users to trust the predictions and make decisions.
Financial analyst may be curious about which latent stock affects the stock of interest the most, relevant departments will be benefited a lot if they have some knowledge of what are the major factors of varying weather, unemployment rates, electricity consumption and so on. In all those scenarios, the importance of exogenous variables (referring to variables that are not affected by others) can be quantified as the contributions they make.
Let us recap the formula of an ARIMAX model with p exogenous variables \(X_t = (x_t^{(1)}, x_t^{(2)}, \ldots, x_t^{(p)})\): $$\Phi(B)\phi(B^s)\widetilde{y_t}=\theta(B)\Theta(B^s)\epsilon_t+\sum_{i=1}^p\beta_i\tilde{x_t}^{(i)}$$, where \(\widetilde{y_t}=(1-B)^d(1-B^s)^Dy_t, \tilde{x_t}^{(i)}=(1-B)^d(1-B^s)^Dx_t^{(i)}\) and \(\beta_i\) is the corresponding regression coefficient.
As can be seen the from the above formula, there are two major parts that can be considered individually in explaining the forecasted values -- ARIMA part and regressor part.

Interpreting the ARIMA Part

To interpret the ARIMA part, we have implemented a decomposition method derived from the Digital Signal Processing realm. From the frequency domain, any time series has some components in low-frequency, high-frequency, and band-frequency areas as well as some irregular ones. In particular, trend part is assumed to have low frequencies, and seasonal part to have high frequencies. Instead of using Fast Fourier Transform(FFT), the auto-regressive and moving-average parts of an ARIMA model also reveal those components inside a time series. Therefore, our algorithm decomposes the forecasted values into trend, seasonal, transitory, and irregular parts directly from the trained model. If a specific component is marked "?" in the output DataFrame, then it means the corresponding component cannot be decomposed by our setting.

Interpreting the Regressor Part

To interpret the regressor part, we adopted the famous LinearSHAP algorithm, which is able to generate the contribution of each exogenous feature to the forecasted values given the regression coefficients and background data.

Key Relevant Parameters

  • background.size: This parameter specifies the size for background data for LinearSHAP, which must be set non-zero in hanaml.ARIMA in order to generate local interpretation.

  • show.explainer: This parameter serves as a trigger for interpreting ARIMA model, set it as TRUE if interpretability is desired.

  • top.k.attributions: This parameter specifies the number of attributions with highest contributions to the forecast values to output. Note that zero-contributed attributes shall not be displayed.

  • trend.mod: The real auto-regressive roots with inverse modulus larger than the value specified in trend.mod will be integrated into trend component.

  • trend.width: This parameter specifies the bandwidth of spectrum of trend component in unit of rad.

  • seasonal.width: This parameter specifies the bandwidth of spectrum of seasonal component in unit of rad.

Examples

Call the function and obtain the result:


> predict(model=autoarima, forecast.length=5)
  TIMESTAMP   FORECAST       SE       LO80      HI80        LO95      HI95
1         0 -15.544832 3.298697 -19.772283 -11.31738 -22.0101587 -9.079505
2         1  35.587390 3.404891  31.223846  39.95094  28.9139269 42.260854
3         2  56.498532 3.411723  52.126231  60.87083  49.8116773 63.185386
4         3   7.086176 3.412170   2.713303  11.45905   0.3984467 13.773906
5         4 -16.266996 3.412250 -20.639972 -11.89402 -22.9548838 -9.579108

If you want to see the decomposed result of predict result, you could set background.size when initializing an instance and set show.explainer = TRUE in the predict():


> autoarm <- hanaml.AutoARIMA(data=data,
                              background.size=10)

Invoke the predict:


> result <- predict(model=autoarm,
                    forecast.method="innovations.algorithm",
                    forecast.length=3,
                    show.explainer=TRUE)

Show the explainer of a hanaml.AutoARIMA instance:


> arm$explainer$Collect()
  TIMESTAMP     TREND   SEASONAL TRANSITORY IRREGULAR EXOGENOUS
1         0 0.1452041 -0.9329735  0.9274021 -24.93706
2         1 4.6110870  0.3368592 12.9455897  25.75553
3         2 6.6124186  0.8155893 17.1545481  47.95495

See also