FairMLClassification

class hana_ml.algorithms.pal.fair_ml.FairMLClassification(fair_submodel='HGBT', fair_constraint='demographic_parity', fair_loss_func='error_rate', fair_num_max_iter=None, fair_num_min_iter=None, fair_learning_rate=None, fair_norm_bound=None, fair_ratio=None, fair_relax=None, fair_threshold=None, fair_exclude_sensitive_variable=None, **kwargs)

FairMLClassification aims at mitigating unfairness of prediction model due to some possible "bias" within data set regarding features such as sex, race, age etc. It is a framework that can utilize other machine learning models or technologies which makes it quite flexible.

Parameters:
fair_submodel{'HGBT'}, optional

Specifies submodel type.

Defaults to 'HGBT'.

fair_constraint{'demographic_parity', 'equalized_odds', 'true_positive_rate_parity', 'false_positive_rate_parity', 'error_rate_parity'}, optional

Specifies constraint.

Defaults to 'demographic_parity'.

fair_loss_func{'error_rate'}, optional

Specifies loss function.

Defaults to 'error_rate'.

fair_num_max_iterint, optional

Specifies the maximum number of iteration performed. Must be greater than or equal to fair_num_min_iter.

Defaults to 50.

fair_num_min_iterint, optional

Specifies the minimum number of iteration performed. Must be less than or equal to fair_num_max_iter.

Defaults to 5.

fair_learning_ratefloat, optional

Specifies learning rate.

Defaults to 0.02.

fair_norm_boundfloat, optional

Specifies bound of Lagrange multiplier. Must be positive.

Defaults to 100.

fair_ratiofloat, optional

Specifies ratio of error allowed in constraint. Must in range (0, 1].

Defaults to 1.0.

fair_relaxfloat, optional

Specifies relaxation of constraint. Must be non-negative.

Defaults to 0.01.

fair_thresholdfloat, optional

Specifies a threshold indicating the timing of stopping algorithm iterations, the greater value the more accuracy but more time consuming, must be positive. If zero is given, then it is decided heuristically.

Defaults to 0.0.

fair_exclude_sensitive_variablebool, optional

Specifies whether or not to exclude sensitive variables when training the fairness-aware model.

Defaults to True, i.e. by default the sensitive variable(s) is excluded in the trained model.

**kwargs: keyword arguments

Parameters for initializing the submodel used for fair classification. In our case these should be the initialization parameters for HybridGradientBoostingClassifiers.

Please see HybridGradientBoostingClassifier for more details.

Examples

>>> fair_ml = FairMLClassification(fair_submodel='HGBT', fair_constraint='demographic_parity')
>>> fair_ml.fit(data=df, fair_sensitive_variable='gender')
>>> res = fair_ml.predict(data=df_predict)
Attributes:
model_DataFrame, structured as follows
  • 1st column: ROW_INDEX, indicates the row.

  • 2nd column: MODEL_CONTENT, model value.

stats_DataFrame

Related statistics, structured as follows:

  • 1st column: STAT_NAME, statistics name.

  • 2nd column: STAT_VALUE, statistics value.

Methods

fit(data[, key, features, label, ...])

The fit function of FairMLClassification.

predict(data[, key, features, thread_ratio, ...])

Predict function for Fair ML.

fit(data, key=None, features=None, label=None, fair_sensitive_variable=None, categorical_variable=None, fair_positive_label=None, thread_ratio=None)

The fit function of FairMLClassification.

Parameters:
dataDataFrame

The input data for training.

keystr, optional

Specifies the ID column.

If data is indexed by a single column, then key defaults to that index column; otherwise key must be specified(i.e. is mandatory).

featureslist of str, optional

Names of the feature columns.

If features is not provided, it defaults to all non-ID, non-label columns.

labelstr, optional

Name of the dependent variable. If label is not provided, it defaults to the last non-ID column.

fair_sensitive_variablestr or list of str

Specifies names of sensitive variable. Can have multiple entities.

categorical_variablestr or list of str, optional

Specify INTEGER column(s) that should be be treated as categorical data.

Other INTEGER columns will be treated as continuous.

fair_positive_labelstr, optional

Specifies label that stands for positive case. Mandatory if fair_constraint is set to 'true_positive_rate_parity' or 'false_positive_rate_parity'.

thread_ratiofloat, optional

Controls the proportion of available threads.

Defaults to 1.0.

predict(data, key=None, features=None, thread_ratio=None, model=None)

Predict function for Fair ML.

Parameters:
dataDataFrame

Data to be predicted.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or is indexed by multiple columns.

Defaults to the index of data if data is indexed by a single column.

featureslist of str, optional

Names of the feature columns.

If features is not provided, it defaults to all non-ID columns.

thread_ratiofloat, optional

Controls the proportion of available threads.

Defaults to 1.0.

modelDataFrame, optional

The model to be used for prediction.

Defaults to the fitted model (model_).

Returns:
DataFrame
  • Predicted result, structured as follows:

    • 1st column: Data type and name same as the 1st column of data.

    • 2nd column: SCORE, class labels.

Inherited Methods from PALBase

Besides those methods mentioned above, the FairMLClassification class also inherits methods from PALBase class, please refer to PAL Base for more details.