r2_score

hana_ml.algorithms.pal.metrics.r2_score(data, label_true, label_pred)

Computes coefficient of determination for regression results.

Parameters:
dataDataFrame

DataFrame of true and predicted values.

label_truestr

Name of the column containing true values.

label_predstr

Name of the column containing values predicted by regression.

Returns:
float

Coefficient of determination. 1.0 indicates an exact match between true and predicted values. A lower coefficient of determination indicates that the regression was able to predict less of the variance in the input. A negative value indicates that the regression performed worse than just taking the mean of the true values and using that for every prediction.

Examples

Actual and predicted values df for a hypothetical regression:

>>> df.collect()
   ACTUAL  PREDICTED
0    0.10        0.2
1    0.90        1.0
2    2.10        1.9
3    3.05        3.0
4    4.00        3.5

R2 score for these predictions:

>>> r2_score(data=df, label_true='ACTUAL', label_pred='PREDICTED')
0.9685233682514102

Compare that to the score for a perfect predictor:

>>> df_perfect.collect()
   ACTUAL  PREDICTED
0    0.10       0.10
1    0.90       0.90
2    2.10       2.10
3    3.05       3.05
4    4.00       4.00
>>> r2_score(data=df_perfect, label_true='ACTUAL', label_pred='PREDICTED')
1.0

A naive mean predictor:

>>> df_mean.collect()
   ACTUAL  PREDICTED
0    0.10       2.03
1    0.90       2.03
2    2.10       2.03
3    3.05       2.03
4    4.00       2.03
>>> r2_score(data=df_mean, label_true='ACTUAL', label_pred='PREDICTED')
0.0

And a really awful predictor df_awful:

>>> df_awful.collect()
   ACTUAL  PREDICTED
0    0.10    12345.0
1    0.90    91923.0
2    2.10    -4444.0
3    3.05    -8888.0
4    4.00    -9999.0
>>> r2_score(data=df_awful, label_true='ACTUAL', label_pred='PREDICTED')
-886477397.139857