Explaining the Forecasts of ARIMA

Let's recap the formula of a seasonal ARIMAX model with p exogenous variables \(\mathbf{x}_t=[x^{(1)}_t, x^{(2)}_t,\ldots, x^{(p)}_t]\):

\[\Phi(B)\phi(B^s)\tilde{y}_t = \theta(B)\Theta(B^s)\epsilon_t + \sum_{i=1}^p \beta_i\tilde{x}_t^{(i)},\]

where \(\tilde{y}_t=(1-B)^d(1-B^s)^Dy_t, \tilde{x}_t^{(i)}=(1-B)^d(1-B^s)^Dx_t^{(i)}\), and \({\beta_i, i=1,\ldots, p}\) are the corresponding regression coefficients. As can be seen from the formula, there are two major parts that can be considered individually in explaining the forecasted values(here the target value for explanation is \(y_t\))

  1. The seasonal ARIMA part

  2. The regressor parts

In many scenarios, the role played by regressors (exogenous variables) is very important. For example, financial analysts may be curious about which latent factors affect the stock of interest the most. Relevant departments will benefit greatly if they have knowledge of the major factors, such as varying weather, unemployment rates, energy consumption, etc. In those scenarios, the importance of regressors can be quantified by the contributions they make to the targeted values in time-series.

Interpretation of the Seasonal ARIMA Part

Interpretation of the seasonal ARIMA part utilizes a decomposition algorithm derived from digital signal processing.

From the frequency domain, any time series has some possible components in low-frequency, high-frequency, and band-frequency areas as well as some irregular ones. In particular, trend part can be think of being associated with low-frequencies, while seasonal part with high-frequencies. In seasonal ARIMA modeling of time-series data, instead of using Fast Fourier Transform(FFT), we naturally have the assumption that

  • the auto-regressive part is mainly associated with low-frequencies;

  • the moving-average part is mainly associated with high-frequencies.

Therefore, the forecasted values are divided into trend, seasonal, transitory, and irregular parts directly from the trained ARIMA model. If a specific component is missing from the generic generic ARIMA formula, its corresponding values will be marked "?" in the output table in our setting.

Interpretation of the Regressor Part

To interpret the regressor part, linearSHAP algorithm is adopted, which is able to generate the contribution of each exogenous feature to the forecasted values given the regression coefficients(and background data, if correlations among exogenous features are considered).