permutation_importance
- hana_ml.algorithms.pal.tsa.permutation_importance.permutation_importance(data, model=None, key=None, endog=None, exog=None, repeat_time=None, random_state=None, thread_ratio=None, partition_ratio=None, regressor_top_k=None, accuracy_measure=None, ignore_zero=None)
Permutation importance for time series is an exogenous regressor evaluation method that measures the increase in the model score when randomly shuffling the exogenous regressor's values.
Based on the same permutation importance method, there are two ways to calculate the exogenous regressor importance: model-specific and model free. And they reveal how much the model relies on the exogenous regressor for forecasting by breaking the association between the exogenous regressor and the true value.
Model-specific means calculating the exogenous regressor importance based on a specific model. For example, ARIMAX, Bayesian structural time series(BSTS), long-term series forecasting(LTSF), and additive model time series analysis(AMTSA).
Model free means calculating the exogenous regressor importance by using a regression method like RDT(Random Decision Trees). For example, Exponential smoothing series.
By using model-specific methods, you need to provide the trained model table and compared true value table to calculate the importance. However, for the model free methods, only the data table is required.
- Parameters:
- dataDataFrame
Input data.
If model is provided, the predict dataset (key and exog) as well as true value (target) is required.
If no model is provided, please enter the data for fitting and prediction.
- modelDataFrame, optional
If model-specific methods are used, a trained model DataFrame of time series algorithm is required. Currently, we support the model of ARIMA, AutoARIMA, LTSF, Additive Model Forecast and BSTS.
Defaults to None.
- keystr, optional
The ID column.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- endogstr, optional
The column of series to be tested.
Defaults to the first non-key column.
- repeat_timeint, optional
Indicates the number of times the exogenous regressor importance should be calculated for each column.
Defaults to 5.
- random_stateint, optional
Specifies the seed for random number generator.
0: Uses the current time (in second) as seed.
Others: Uses the specified value as seed.
Defaults to 0.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to -1.
- partition_ratiofloat, optional
Splits the input data into two parts: training data and compare data.
Only valid when
model
is None (no model is provided).Defaults to 0.3.
- regressor_top_kint, optional
Captures the top K exogenous regressors.
Defaults to 10.
- accuracy_measurestr or a list of str, optional
The metric to quantify how well a model fits input data. Options: "mpe", "mse", "rmse", "mape".
No default value.
- ignore_zerobool, optional
False: Uses zero values in the input dataset when calculating "mpe" or "mape".
True: Ignores zero values in the input dataset when calculating "mpe" or "mape".
Only valid when
accuracy_measure
is "mpe" or "mape".Defaults to False.
- Returns:
- DataFrame
The importance of the exogenous regressor, structured as follows:
PAIR : Measure name.
NAME : Exogenous regressor name.
VALUE : The importance of the exogenous regressor.
Examples
Example 1: model-specific
>>> bsts = BSTS(burn=0.6, expected_model_size=1, niter=200, seed=1) >>> bsts.fit(data=df_fit, key='ID', endog='TARGET') >>> pires = permutation_importance(data=df_predict, accuracy_measure=['mse', 'mape'], regressor_top_k=3, model=bsts.model_, key='ID', endog='TARGET')
Example 2: model free (no model is provided)
>>> pires = permutation_importance(data=df, accuracy_measure=['mse', 'mape'], random_state=1, regressor_top_k=4, key='ID', endog='TARGET')