multiclass_auc
- hana_ml.algorithms.pal.metrics.multiclass_auc(data_original, data_predict)
Computes area under curve (AUC) to evaluate the performance of multi-class classification algorithms.
- Parameters:
- data_originalDataFrame
True class data, structured as follows:
Data point ID column.
True class of the data point.
- data_predictDataFrame
Predicted class data, structured as follows:
Data point ID column.
Possible class.
Classifier-computed probability that the data point belongs to that particular class.
For each data point ID, there should be one row for each possible class.
- Returns:
- float
The area under the receiver operating characteristic curve.
- DataFrame
False positive rate and true positive rate (ROC), structured as follows:
ID column, type INTEGER.
FPR, type DOUBLE, representing false positive rate.
TPR, type DOUBLE, representing true positive rate.
Examples
Input DataFrame df_original and df_predict:
>>> df_original.collect() ID ORIGINAL 0 1 1 1 2 1 ... 9 10 3
>>> df_predict.collect() ID PREDICT PROB 0 1 1 0.90 1 1 2 0.05 ... 29 10 3 0.60
Compute Area Under Curve:
>>> auc, roc = multiclass_auc(data_original=df_original, data_predict=df_predict)
Output:
>>> print(auc) 1.0
>>> roc.collect() ID FPR TPR 0 0 1.00 1.0 1 1 0.90 1.0 ... 10 10 0.00 0.0