r2_score
- hana_ml.algorithms.pal.metrics.r2_score(data, label_true, label_pred)
Computes coefficient of determination for regression results.
- Parameters:
- dataDataFrame
DataFrame of true and predicted values.
- label_truestr
Name of the column containing true values.
- label_predstr
Name of the column containing values predicted by regression.
- Returns:
- float
Coefficient of determination. 1.0 indicates an exact match between true and predicted values. A lower coefficient of determination indicates that the regression was able to predict less of the variance in the input. A negative value indicates that the regression performed worse than just taking the mean of the true values and using that for every prediction.
Examples
Actual and predicted values df for a hypothetical regression:
>>> df.collect() ACTUAL PREDICTED 0 0.10 0.2 1 0.90 1.0 2 2.10 1.9 3 3.05 3.0 4 4.00 3.5
R2 score for these predictions:
>>> r2_score(data=df, label_true='ACTUAL', label_pred='PREDICTED') 0.9685233682514102
Compare that to the score for a perfect predictor:
>>> df_perfect.collect() ACTUAL PREDICTED 0 0.10 0.10 1 0.90 0.90 2 2.10 2.10 3 3.05 3.05 4 4.00 4.00 >>> r2_score(data=df_perfect, label_true='ACTUAL', label_pred='PREDICTED') 1.0
A naive mean predictor:
>>> df_mean.collect() ACTUAL PREDICTED 0 0.10 2.03 1 0.90 2.03 2 2.10 2.03 3 3.05 2.03 4 4.00 2.03 >>> r2_score(data=df_mean, label_true='ACTUAL', label_pred='PREDICTED') 0.0
And a really awful predictor df_awful:
>>> df_awful.collect() ACTUAL PREDICTED 0 0.10 12345.0 1 0.90 91923.0 2 2.10 -4444.0 3 3.05 -8888.0 4 4.00 -9999.0 >>> r2_score(data=df_awful, label_true='ACTUAL', label_pred='PREDICTED') -886477397.139857