auc
- hana_ml.algorithms.pal.metrics.auc(data, positive_label=None, output_threshold=None)
Computes area under curve (AUC) to evaluate the performance of binary-class classification algorithms.
- Parameters:
- dataDataFrame
Input data, structured as follows:
ID column.
True class of the data point.
Classifier-computed probability that the data point belongs to the positive class.
- positive_labelstr, optional
If original label is not 0 or 1, specifies the label value which will be mapped to 1.
- output_thresholdbool, optional
Specifies whether or not to output the corresponding threshold values in the roc table.
Defaults to False.
- Returns:
- float
The area under the receiver operating characteristic curve.
- DataFrame
False positive rate and true positive rate (ROC), structured as follows:
ID column, type INTEGER.
FPR, type DOUBLE, representing false positive rate.
TPR, type DOUBLE, representing true positive rate.
THRESHOLD, type DOUBLE, representing the corresponding threshold value, available only when
output_threshold
is set to True.
Examples
Input DataFrame df:
>>> df.collect() ID ORIGINAL PREDICT 0 1 0 0.07 1 2 0 0.01 ... 8 9 1 0.20 9 10 1 0.95
Compute Area Under Curve:
>>> auc, roc = auc(data=df)
Output:
>>> print(auc) 0.66
>>> roc.collect() ID FPR TPR 0 0 1.0 1.0 1 1 0.8 1.0 2 2 0.6 1.0 3 3 0.6 0.6 4 4 0.4 0.6 5 5 0.2 0.4 6 6 0.2 0.2 7 7 0.0 0.2 8 8 0.0 0.0