Explaining the Forecasts of ARIMA

Let's recap the formula of a seasonal ARIMAX model with p exogenous variables \(\mathbf{x}_t=[x^{(1)}_t, x^{(2)}_t,\ldots, x^{(p)}_t]\):

\[\Phi(B)\phi(B^s)\tilde{y}_t = \theta(B)\Theta(B^s)\epsilon_t + \sum_{i=1}^p \beta_i\tilde{x}_t^{(i)},\]

where \(\tilde{y}_t=(1-B)^d(1-B^s)^Dy_t, \tilde{x}_t^{(i)}=(1-B)^d(1-B^s)^Dx_t^{(i)}\), and \({\beta_i, i=1,\ldots, p}\) are the corresponding regression coefficients. As can be seen from the formula, there are two major parts that can be considered individually in explaining the forecasted values(here the target value for explanation is \(y_t\))

  1. The seasonal ARIMA part

  2. The regressor parts

In many scenarios, the role played by regressors(exogenous variables) is very import. For example, finacial analyst may be be curious about which latent factor affects the stock of interest the most, relevant departments will be benefitted much if they have knowledge of what are the major factors like varying weather, unemployment rates, energy consumption, etc. In those scenarios, the importance of regressors can be quantified by the contributions that make to the targeted values in time-series.

Interpretation of the Seasonal ARIMA Part

Interpretation of the seasonal ARIMA part utilizes a decomposition algorithm derived from digital signal processing.

From the frequency domain, any time series has some possible components in low-frequency, high-frequency, and band-frequency areas as well as some irregular ones. In particular, trend part can be think of being associated with low-frequencies, while seasonal part with high-frequencies. In seasonal ARIMA modeling of time-series data, instead of using Fast Fourier Transform(FFT), we naturally have the assumption that

  • the auto-regressive part is maily associated with low-frequencies;

  • the moving-average part is mainly associated with high-frequencies.

Therefore, the forecasted values are divided into trend, seasonal, transitory, and irregular parts directly from the trained ARIMA model. If a specific component is missing from the generic generic ARIMA formula, its corresponding values will be marked "?" in the output table in our setting.

Interpretation of the Regressor Part

To interpret the regressor part, linearSHAP algorithm is adopted, which is able to generate the contribution of each exogenous feature to the forecasted values given the regression coefficients(and background data, if correlations among exogenous features are considered).