FairMLRegression

class hana_ml.algorithms.pal.fair_ml.FairMLRegression(fair_bound, fair_submodel='HGBT', fair_constraint='bounded_group_loss', fair_loss_func='mse', fair_loss_func_for_constraint='mse', fair_num_max_iter=None, fair_num_min_iter=None, fair_learning_rate=None, fair_norm_bound=None, fair_threshold=None, fair_exclude_sensitive_variable=None, **kwargs)

FairMLRegression aims at mitigating unfairness of prediction model due to some possible "bias" within dataset regarding features such as sex, race, age etc. It is a framework that can utilize other machine learning models or technologies which makes it quite flexible.

Parameters:

fair_boundint

Specifies upper bound of constraint. Must be positive.

fair_submodel{'HGBT'}, optional

Specifies submodel type.

Defaults to 'HGBT'.

fair_constraint{'bounded_group_loss'}, optional

Specifies constraint.

Defaults to 'bounded_group_loss'.

fair_loss_func{'mse', 'mae'}, optional

Specifies loss function.

Defaults to 'mse'.

fair_loss_func_for_constraint: {'mse', 'mae'}, optional

Specifies loss function that is part of constraint configuration.

Defaults to 'mse'.

fair_num_max_iterint, optional

Specifies the maximum number of iteration performed. Must be greater than or equal to fair_num_min_iter.

Defaults to 50.

fair_num_min_iterint, optional

Specifies the minimum number of iteration performed. Must be less than or equal to fair_num_max_iter.

Defaults to 5.

fair_learning_ratefloat, optional

Specifies learning rate.

Defaults to 0.02.

fair_norm_boundfloat, optional

Specifies bound of Lagrange multiplier. Must be positive.

Defaults to 100.

fair_thresholdfloat, optional

Specifies a threshold indicating the timing of stopping algorithm iterations, the greater value the more accuracy but more time consuming, must be positive. If zero is given, then it is decided heuristically.

Defaults to 0.0.

fair_exclude_sensitive_variablebool, optional

Specifies whether or not to exclude sensitive variables when training the fairness-aware model.

Defaults to True, i.e. by default the sensitive variable(s) is excluded in the trained model.

**kwargs: keyword arguments

Parameters for initializing the submodel used for fair regression. In our case these should be initialization parameters for HybridGradientBoostingRegressor.

See HybridGradientBoostingRegressor for more details.

Examples

>>> fair_ml = FairMLRegression(fair_bound=0.5)
>>> fair_ml.fit(data=df, fair_sensitive_variable='gender')
>>> res = fair_ml.predict(data=df_predict)

Attributes:

model_DataFrame: Model content.
stats_DataFrame: Statistics.

Methods

`fit`(data[, key, features, label, ...])	Fit the model to the training dataset.
`get_model_metrics`()	Get the model metrics.
`get_score_metrics`()	Get the score metrics.
`predict`(data[, key, features, thread_ratio, ...])	Predict function for Fair ML.

fit(data, key=None, features=None, label=None, fair_sensitive_variable=None, categorical_variable=None, thread_ratio=None)

Fit the model to the training dataset.

Parameters:

dataDataFrame

The input data for training.

keystr, optional

Specifies the ID column.

If data is indexed by a single column, then key defaults to that index column; otherwise key must be specified(i.e. is mandatory).

featuresa list of str, optional

Names of the feature columns.

If features is not provided, it defaults to all non-ID, non-label columns.

labelstr, optional

Name of the dependent variable. If label is not provided, it defaults to the last non-ID column.

fair_sensitive_variablestr or list of str

Specifies names of sensitive variable. Can have multiple entities.

categorical_variablestr or a list of str, optional

Specifies which INTEGER columns should be treated as categorical, with all other INTEGER columns treated as continuous.

No default value.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 1.0.

Returns:

A fitted object of class "FairMLRegression".

get_model_metrics()

Get the model metrics.

Returns:

DataFrame: The model metrics.

get_score_metrics()

Get the score metrics.

Returns:

DataFrame: The score metrics.

predict(data, key=None, features=None, thread_ratio=None, model=None)

Predict function for Fair ML.

Parameters:

dataDataFrame

Data to be predicted.

keystr, optional

Name of the ID column. Mandatory if data is not indexed, or is indexed by multiple columns.

Defaults to the index of data if data is indexed by a single column.

featuresa list of str, optional

Names of the feature columns.

If features is not provided, it defaults to all non-ID columns.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 1.0.

modelDataFrame, optional

The model to be used for prediction.

Defaults to the fitted model (model_).

Returns:

DataFrame: Predicted result.

Inherited Methods from PALBase

Besides those methods mentioned above, the FairMLRegression class also inherits methods from PALBase class, please refer to PAL Base for more details.