auc

hana_ml.algorithms.pal.metrics.auc(data, positive_label=None, output_threshold=None)

Computes area under curve (AUC) to evaluate the performance of binary-class classification algorithms.

Parameters
dataDataFrame

Input data, structured as follows:

  • ID column.

  • True class of the data point.

  • Classifier-computed probability that the data point belongs to the positive class.

positive_labelstr, optional

If original label is not 0 or 1, specifies the label value which will be mapped to 1.

output_thresholdbool, optional

Specifies whether or not to output the corresponding threshold values in the roc table.

Defaults to False.

Returns
float

The area under the receiver operating characteristic curve.

DataFrame

False positive rate and true positive rate (ROC), structured as follows:

  • ID column, type INTEGER.

  • FPR, type DOUBLE, representing false positive rate.

  • TPR, type DOUBLE, representing true positive rate.

  • THRESHOLD, type DOUBLE, representing the corresponding threshold value, available only when output_threshold is set to True.

Examples

Input DataFrame df:

>>> df.collect()
   ID  ORIGINAL  PREDICT
0   1         0     0.07
1   2         0     0.01
2   3         0     0.85
3   4         0     0.30
4   5         0     0.50
5   6         1     0.50
6   7         1     0.20
7   8         1     0.80
8   9         1     0.20
9  10         1     0.95

Compute Area Under Curve:

>>> auc, roc = auc(data=df)

Output:

>>> print(auc)
 0.66
>>> roc.collect()
   ID  FPR  TPR
0   0  1.0  1.0
1   1  0.8  1.0
2   2  0.6  1.0
3   3  0.6  0.6
4   4  0.4  0.6
5   5  0.2  0.4
6   6  0.2  0.2
7   7  0.0  0.2
8   8  0.0  0.0