multiclass_auc
- hana_ml.algorithms.pal.metrics.multiclass_auc(data_original, data_predict)
Computes area under curve (AUC) to evaluate the performance of multi-class classification algorithms.
- Parameters
- data_originalDataFrame
True class data, structured as follows:
Data point ID column.
True class of the data point.
- data_predictDataFrame
Predicted class data, structured as follows:
Data point ID column.
Possible class.
Classifier-computed probability that the data point belongs to that particular class.
For each data point ID, there should be one row for each possible class.
- Returns
- float
The area under the receiver operating characteristic curve.
- DataFrame
False positive rate and true positive rate (ROC), structured as follows:
ID column, type INTEGER.
FPR, type DOUBLE, representing false positive rate.
TPR, type DOUBLE, representing true positive rate.
Examples
Input DataFrame df:
>>> df_original.collect() ID ORIGINAL 0 1 1 1 2 1 2 3 1 3 4 2 4 5 2 5 6 2 6 7 3 7 8 3 8 9 3 9 10 3
>>> df_predict.collect() ID PREDICT PROB 0 1 1 0.90 1 1 2 0.05 2 1 3 0.05 3 2 1 0.80 4 2 2 0.05 5 2 3 0.15 6 3 1 0.80 7 3 2 0.10 8 3 3 0.10 9 4 1 0.10 10 4 2 0.80 11 4 3 0.10 12 5 1 0.20 13 5 2 0.70 14 5 3 0.10 15 6 1 0.05 16 6 2 0.90 17 6 3 0.05 18 7 1 0.10 19 7 2 0.10 20 7 3 0.80 21 8 1 0.00 22 8 2 0.00 23 8 3 1.00 24 9 1 0.20 25 9 2 0.10 26 9 3 0.70 27 10 1 0.20 28 10 2 0.20 29 10 3 0.60
Compute Area Under Curve:
>>> auc, roc = multiclass_auc(data_original=df_original, data_predict=df_predict)
Output:
>>> print(auc) 1.0
>>> roc.collect() ID FPR TPR 0 0 1.00 1.0 1 1 0.90 1.0 2 2 0.65 1.0 3 3 0.25 1.0 4 4 0.20 1.0 5 5 0.00 1.0 6 6 0.00 0.9 7 7 0.00 0.7 8 8 0.00 0.3 9 9 0.00 0.1 10 10 0.00 0.0