Permutation Feature Importance for Time Series
Permutation importance for time series is an exogenous regressor evaluation method that measures the increase in the model score when randomly shuffling the exogenous regressor's values.
Based on the same permutation importance method, there are two ways to calculate the exogenous regressor importance: model-specific and model free. And they reveal how much the model relies on the exogenous regressor for forecasting by breaking the association between the exogenous regressor and the true value.
Model-specific means calculating the exogenous regressor importance based on a specific model. For example, ARIMAX, Bayesian structural time series(BSTS), long-term series forecasting(LTSF), and additive model time series analysis(AMTSA).
Model free means calculating the exogenous regressor importance by using a regression method like RDT(Random Decision Trees).
By using model-specific methods, you need to provide the trained model table and compared true value table to calculate the importance. However, for the model free methods, only the data table is required.
Parameters
- dataDataFrame
Input data including key, endog and exog.
- modelDataFrame or str, optional
If model-specific methods are used, a trained model DataFrame of time series algorithm is required. Currently, we support the model of ARIMA, AutoARIMA, LTSF, Additive Model Forecast and BSTS.
if 'rdt' is provided, the model free method is used.
Defaults to self.model_.
- keystr, optional
The ID column.
Defaults to self.key.
- endogstr, optional
The column of series to be tested.
Defaults to self.endog.
- exogstr or a list of str, optional
The column(s) of exogenous regressors.
Defaults to self.exog.
- repeat_timeint, optional
Indicates the number of times the exogenous regressor importance should be calculated for each column.
Defaults to 5.
- random_stateint, optional
Specifies the seed for random number generator.
0 : Uses the current time (in second) as seed.
Others : Uses the specified value as seed.
Defaults to 0.
- thread_ratiofloat, optional
The ratio of available threads.
0 : single thread.
0~1 : uses the specified percentage of available threads. PAL uses all available threads if the number is 1.0.
-1 : uses all of available threads.
Defaults to -1.
- partition_ratiofloat, optional
Splits the input data into two parts: training data and compare data. Only valid when
model
is None (no model is provided).Defaults to 0.3.
- regressor_top_kint, optional
Captures the top K exogenous regressors.
Defaults to 10.
- accuracy_measurestr or a list of str, optional
The metric to quantify how well a model fits input data. Options: "mpe", "mse", "rmse", "mape".
No default value.
- ignore_zerobool, optional
False : Uses zero values in the input dataset when calculating "mpe" or "mape".
True : Ignores zero values in the input dataset when calculating "mpe" or "mape".
Only valid when
accuracy_measure
is "mpe" or "mape".Defaults to False.
Returns
- DataFrame
The importance of the exogenous regressor, structured as follows:
PAIR : Measure name.
NAME : Exogenous regressor name.
VALUE : The importance of the exogenous regressor.
Examples
Assume obj is an instance of ARIMA, AutoARIMA, AdditiveModelForecast, LTSF, or BSTS and df_predict is a HANA DataFrame which is not used for fitting the model.
>>> res = obj.get_permutation_importance(data=df_predict,
model=obj.model_,
accuracy_measure=['mse', 'mape'],
key='ID',
endog='TARGET')
>>> print(res.collect())