auc

hana_ml.algorithms.pal.metrics.auc(data, positive_label=None, output_threshold=None)

Computes area under curve (AUC) to evaluate the performance of binary-class classification algorithms.

Parameters:

dataDataFrame

Input data, structured as follows:

ID column.

True class of the data point.

Classifier-computed probability that the data point belongs to the positive class.

positive_labelstr, optional

If original label is not 0 or 1, specifies the label value which will be mapped to 1.

output_thresholdbool, optional

Specifies whether or not to output the corresponding threshold values in the roc table.

Defaults to False.

Returns:

float

The area under the receiver operating characteristic curve.

DataFrame

False positive rate and true positive rate (ROC), structured as follows:

ID column, type INTEGER.

FPR, type DOUBLE, representing false positive rate.

TPR, type DOUBLE, representing true positive rate.

THRESHOLD, type DOUBLE, representing the corresponding threshold value, available only when output_threshold is set to True.

Examples

Input DataFrame df:

>>> df.collect()
   ID  ORIGINAL  PREDICT
 1         0     0.07
 2         0     0.01
 3         0     0.85
 4         0     0.30
 5         0     0.50
 6         1     0.50
 7         1     0.20
 8         1     0.80
 9         1     0.20
10         1     0.95

Compute Area Under Curve:

>>> auc, roc = auc(data=df)

Output:

>>> print(auc)
 0.66

>>> roc.collect()
   ID  FPR  TPR
 0  1.0  1.0
 1  0.8  1.0
 2  0.6  1.0
 3  0.6  0.6
 4  0.4  0.6
 5  0.2  0.4
 6  0.2  0.2
 7  0.0  0.2
 8  0.0  0.0