auc

hana_ml.algorithms.pal.metrics.auc(data, positive_label=None, output_threshold=None)

Computes area under curve (AUC) to evaluate the performance of binary-class classification algorithms.

Parameters:
dataDataFrame

Input data, structured as follows:

  • ID column.

  • True class of the data point.

  • Classifier-computed probability that the data point belongs to the positive class.

positive_labelstr, optional

If original label is not 0 or 1, specifies the label value which will be mapped to 1.

output_thresholdbool, optional

Specifies whether or not to output the corresponding threshold values in the roc table.

Defaults to False.

Returns:
float

The area under the receiver operating characteristic curve.

DataFrame

False positive rate and true positive rate (ROC), structured as follows:

  • ID column, type INTEGER.

  • FPR, type DOUBLE, representing false positive rate.

  • TPR, type DOUBLE, representing true positive rate.

  • THRESHOLD, type DOUBLE, representing the corresponding threshold value, available only when output_threshold is set to True.

Examples

Input DataFrame df:

>>> df.collect()
   ID  ORIGINAL  PREDICT
0   1         0     0.07
1   2         0     0.01
...
8   9         1     0.20
9  10         1     0.95

Compute Area Under Curve:

>>> auc, roc = auc(data=df)

Output:

>>> print(auc)
 0.66
>>> roc.collect()
   ID  FPR  TPR
0   0  1.0  1.0
1   1  0.8  1.0
2   2  0.6  1.0
3   3  0.6  0.6
4   4  0.4  0.6
5   5  0.2  0.4
6   6  0.2  0.2
7   7  0.0  0.2
8   8  0.0  0.0