FairMLRegression
- class hana_ml.algorithms.pal.fair_ml.FairMLRegression(fair_bound, fair_submodel='HGBT', fair_constraint='bounded_group_loss', fair_loss_func='mse', fair_loss_func_for_constraint='mse', fair_num_max_iter=None, fair_num_min_iter=None, fair_learning_rate=None, fair_norm_bound=None, fair_threshold=None, fair_exclude_sensitive_variable=None, **kwargs)
FairMLRegression aims at mitigating unfairness of prediction model due to some possible "bias" within dataset regarding features such as sex, race, age etc. It is a framework that can utilize other machine learning models or technologies which makes it quite flexible.
- Parameters:
- fair_boundint
Specifies upper bound of constraint. Must be positive.
- fair_submodel{'HGBT'}, optional
Specifies submodel type.
Defaults to 'HGBT'.
- fair_constraint{'bounded_group_loss'}, optional
Specifies constraint.
Defaults to 'bounded_group_loss'.
- fair_loss_func{'mse', 'mae'}, optional
Specifies loss function.
Defaults to 'mse'.
- fair_loss_func_for_constraint: {'mse', 'mae'}, optional
Specifies loss function that is part of constraint configuration.
Defaults to 'mse'.
- fair_num_max_iterint, optional
Specifies the maximum number of iteration performed. Must be greater than or equal to
fair_num_min_iter
.Defaults to 50.
- fair_num_min_iterint, optional
Specifies the minimum number of iteration performed. Must be less than or equal to
fair_num_max_iter
.Defaults to 5.
- fair_learning_ratefloat, optional
Specifies learning rate.
Defaults to 0.02.
- fair_norm_boundfloat, optional
Specifies bound of Lagrange multiplier. Must be positive.
Defaults to 100.
- fair_thresholdfloat, optional
Specifies a threshold indicating the timing of stopping algorithm iterations, the greater value the more accuracy but more time consuming, must be positive. If zero is given, then it is decided heuristically.
Defaults to 0.0.
- fair_exclude_sensitive_variablebool, optional
Specifies whether or not to exclude sensitive variables when training the fairness-aware model.
Defaults to True, i.e. by default the sensitive variable(s) is excluded in the trained model.
- **kwargs: keyword arguments
Parameters for initializing the submodel used for fair regression. In our case these should be initialization parameters for HybridGradientBoostingRegressor.
See
HybridGradientBoostingRegressor
for more details.
Examples
>>> fair_ml = FairMLRegression(fair_bound=0.5) >>> fair_ml.fit(data=df, fair_sensitive_variable='gender') >>> res = fair_ml.predict(data=df_predict)
- Attributes:
- model_DataFrame
Model content.
- stats_DataFrame
Statistics.
Methods
fit
(data[, key, features, label, ...])Fit the model to the training dataset.
Get the model metrics.
Get the score metrics.
predict
(data[, key, features, thread_ratio, ...])Predict function for Fair ML.
- fit(data, key=None, features=None, label=None, fair_sensitive_variable=None, categorical_variable=None, thread_ratio=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
The input data for training.
- keystr, optional
Specifies the ID column.
If
data
is indexed by a single column, thenkey
defaults to that index column; otherwisekey
must be specified(i.e. is mandatory).- featuresa list of str, optional
Names of the feature columns.
If
features
is not provided, it defaults to all non-ID, non-label columns.- labelstr, optional
Name of the dependent variable. If
label
is not provided, it defaults to the last non-ID column.- fair_sensitive_variablestr or list of str
Specifies names of sensitive variable. Can have multiple entities.
- categorical_variablestr or a list of str, optional
Specifies which INTEGER columns should be treated as categorical, with all other INTEGER columns treated as continuous.
No default value.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 1.0.
- Returns:
- A fitted object of class "FairMLRegression".
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
- predict(data, key=None, features=None, thread_ratio=None, model=None)
Predict function for Fair ML.
- Parameters:
- dataDataFrame
Data to be predicted.
- keystr, optional
Name of the ID column. Mandatory if
data
is not indexed, or is indexed by multiple columns.Defaults to the index of
data
ifdata
is indexed by a single column.- featuresa list of str, optional
Names of the feature columns.
If
features
is not provided, it defaults to all non-ID columns.- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 1.0.
- modelDataFrame, optional
The model to be used for prediction.
Defaults to the fitted model (model_).
- Returns:
- DataFrame
Predicted result.
Inherited Methods from PALBase
Besides those methods mentioned above, the FairMLRegression class also inherits methods from PALBase class, please refer to PAL Base for more details.