Permutation Feature Importance
Permutation feature importance is a feature evaluation method that measures the decrease in the model score when we randomly shuffled the feature's values. It reveals how much the model relies on the feature for prediction by breaking the association between the feature and the true outcome.
Permutation importance benefits from being model agnostic and can avoid bias against low cardinality features in tree's impurity-based feature importance method. To take into account the feature's contribution to the model's generalization ability, in SAP HANA PAL, permutation importance is computed on validation set in the training procedure. In hana_ml, this functionality is wrapped up in the fit() method of class hana_ml.algorithms.pal.unified_classification.UnifiedClassification and hana_ml.algorithms.pal.unified_classification.UnifiedRegression, and reflected by the parameters listed as follows:
permutation_importance: bool, optionalSpecifies whether to calculate permutation feature importance or not:
True : Yes
False : No
Defaults to False.
permutation_evaluation_metric: str, optionalSpecifies evaluation metric involved in permutation importance calculation.
In UnifiedClassification: options are 'accuracy', 'auc', 'kappa', 'mcc'. In UnifiedRegression: options are 'rmse', 'mae', 'mape'.
No default value.
permutation_n_repeats: int, optionalSpecifies the numbers of times to permute the values in a feature column.
Defaults to 5.
permutation_seed: int, optionalSpecifies the seed used for randomly permuting a feature column.
0 : Use current system time as seed
Others : Use the specified value as seed
Defaults to 0.
permutation_n_samples: int, optionalSpecifies the number of samples to draw in each repeat.
0 : draw all samples in each repeat(e.g. no sampling).
Others : draw the specified number of samples.
So the specified value should be non-negative.
If
permutation_n_samplesis larger than the number of samples provided in the validation set, then all samples will be used.