Permutation Feature Importance

Permutation feature importance is a feature evaluation method that measures the decrease in the model score when we randomly shuffled the feature's values. It reveals how much the model relies on the feature for prediction by breaking the association between the feature and the true outcome.

Permutation importance benefits from being model agnostic and can avoid bias against low cardinality features in tree's impurity-based feature importance method. To take into account the feature's contribution to the model's generalization ability, in SAP HANA PAL, permutation importance is computed on validation set in the training procedure. In hana_ml, this functionality is wrapped up in the fit() method of class hana_ml.algorithms.pal.unified_classification.UnifiedClassification and hana_ml.algorithms.pal.unified_classification.UnifiedRegression, and reflected by the parameters listed as follows:

  • permutation_importance : bool, optional

    Specifies whether to calculate permutation feature importance or not:

    • True : Yes

    • False : No

    Defaults to False.

  • permutation_evaluation_metric : str, optional

    Specifies evaluation metric involved in permutation importance calculation.

    In UnifiedClassification: options are 'accuracy', 'auc', 'kappa', 'mcc'. In UnifiedRegression: options are 'rmse', 'mae', 'mape'.

    No default value.

  • permutation_n_repeats : int, optional

    Specifies the numbers of times to permute the values in a feature column.

    Defaults to 5.

  • permutation_seed : int, optional

    Specifies the seed used for randomly permuting a feature column.

    • 0 : Use current system time as seed

    • Others : Use the specified value as seed

    Defaults to 0.

  • permutation_n_samples : int, optional

    Specifies the number of samples to draw in each repeat.

    • 0 : draw all samples in each repeat(e.g. no sampling).

    • Others : draw the specified number of samples.

    So the specified value should be non-negative.

    If permutation_n_samples is larger than the number of samples provided in the validation set, then all samples will be used.