Permutation Feature Importance for Time Series

Permutation importance for time series is an exogenous regressor evaluation method that measures the increase in the model score when randomly shuffling the exogenous regressor's values.

Based on the same permutation importance method, there are two ways to calculate the exogenous regressor importance: model-specific and model free. And they reveal how much the model relies on the exogenous regressor for forecasting by breaking the association between the exogenous regressor and the true value.

  • Model-specific means calculating the exogenous regressor importance based on a specific model. For example, ARIMAX, Bayesian structural time series(BSTS), long-term series forecasting(LTSF), and additive model time series analysis(AMTSA).

  • Model free means calculating the exogenous regressor importance by using a regression method like RDT(Random Decision Trees).

By using model-specific methods, you need to provide the trained model table and compared true value table to calculate the importance. However, for the model free methods, only the data table is required.

Parameters

dataDataFrame

Input data including key, endog and exog.

modelDataFrame or str, optional
  • If model-specific methods are used, a trained model DataFrame of time series algorithm is required. Currently, we support the model of ARIMA, AutoARIMA, LTSF, Additive Model Forecast and BSTS.

  • if 'rdt' is provided, the model free method is used.

Defaults to self.model_.

keystr, optional

The ID column.

Defaults to self.key.

endogstr, optional

The column of series to be tested.

Defaults to self.endog.

exogstr or a list of str, optional

The column(s) of exogenous regressors.

Defaults to self.exog.

repeat_timeint, optional

Indicates the number of times the exogenous regressor importance should be calculated for each column.

Defaults to 5.

random_stateint, optional

Specifies the seed for random number generator.

  • 0 : Uses the current time (in second) as seed.

  • Others : Uses the specified value as seed.

Defaults to 0.

thread_ratiofloat, optional

The ratio of available threads.

  • 0 : single thread.

  • 0~1 : uses the specified percentage of available threads. PAL uses all available threads if the number is 1.0.

  • -1 : uses all of available threads.

Defaults to -1.

partition_ratiofloat, optional

Splits the input data into two parts: training data and compare data. Only valid when model is None (no model is provided).

Defaults to 0.3.

regressor_top_kint, optional

Captures the top K exogenous regressors.

Defaults to 10.

accuracy_measurestr or a list of str, optional

The metric to quantify how well a model fits input data. Options: "mpe", "mse", "rmse", "mape".

No default value.

ignore_zerobool, optional
  • False : Uses zero values in the input dataset when calculating "mpe" or "mape".

  • True : Ignores zero values in the input dataset when calculating "mpe" or "mape".

Only valid when accuracy_measure is "mpe" or "mape".

Defaults to False.

Returns

DataFrame

The importance of the exogenous regressor, structured as follows:

  • PAIR : Measure name.

  • NAME : Exogenous regressor name.

  • VALUE : The importance of the exogenous regressor.

Examples

Assume obj is an instance of ARIMA, AutoARIMA, AdditiveModelForecast, LTSF, or BSTS and df_predict is a HANA DataFrame which is not used for fitting the model.

>>> res = obj.get_permutation_importance(data=df_predict,
                                         model=obj.model_,
                                         accuracy_measure=['mse', 'mape'],
                                         key='ID',
                                         endog='TARGET')
>>> print(res.collect())