multiclass_auc

hana_ml.algorithms.pal.metrics.multiclass_auc(data_original, data_predict)

Computes area under curve (AUC) to evaluate the performance of multi-class classification algorithms.

Parameters:
data_originalDataFrame

True class data, structured as follows:

  • Data point ID column.

  • True class of the data point.

data_predictDataFrame

Predicted class data, structured as follows:

  • Data point ID column.

  • Possible class.

  • Classifier-computed probability that the data point belongs to that particular class.

For each data point ID, there should be one row for each possible class.

Returns:
float

The area under the receiver operating characteristic curve.

DataFrame

False positive rate and true positive rate (ROC), structured as follows:

  • ID column, type INTEGER.

  • FPR, type DOUBLE, representing false positive rate.

  • TPR, type DOUBLE, representing true positive rate.

Examples

Input DataFrame df_original and df_predict:

>>> df_original.collect()
   ID  ORIGINAL
0   1         1
1   2         1
...
9  10         3
>>> df_predict.collect()
    ID  PREDICT  PROB
0    1        1  0.90
1    1        2  0.05
...
29  10        3  0.60

Compute Area Under Curve:

>>> auc, roc = multiclass_auc(data_original=df_original,
                              data_predict=df_predict)

Output:

>>> print(auc)
1.0
>>> roc.collect()
    ID   FPR  TPR
0    0  1.00  1.0
1    1  0.90  1.0
...
10  10  0.00  0.0