hanaml.AUC is a R wrapper for SAP HANA PAL AUC.

hanaml.AUC(data, key = NULL, positive.label = NULL, output.threshold = NULL)

Arguments

data

DataFrame
DataFrame containing the data. structured as follows:

  • ID: column with index.

  • True Class: true data point.

  • Classifier: computed probability that the data point belongs to the positive class.

key

character
Name of the ID column.

positive.label

character, optional
If original label is not 0 or 1, specifies the label value which will be mapped to 1.

output.threshold

logical, optional
Specifies whether or not to output threshold values for roc table.
Default to FALSE.

Value

Return an "AUC" object with following values:

  • auc, double The area under the receiver operating characteristic curve.

  • roc, DataFrame False positive rate and true positive rate, structured as follows:

    • ID, type INTEGER column with index

    • FPR, type DOUBLE representing false positive rate.

    • TPR, type DOUBLE representing true positive rate.

    • THRESHOLD, type DOUBLE representing the corresponding threshold value, available only when output.threshold is set TRUE.

Details

Area under curve (AUC) is a traditional method to evaluate the performance of classification algorithms. Basically, it can evaluate binary classifiers, but it can also be extended to multiple-class condition easily.

Examples

Input DataFrame data:


> data$Collect()
   ID  ORIGINAL  PREDICT
1   1         0     0.07
2   2         0     0.01
3   3         0     0.85
4   4         0     0.30
5   5         0     0.50
6   6         1     0.50
7   7         1     0.20
8   8         1     0.80
9   9         1     0.20
10 10         1     0.95

Compute Area Under Curve:


> auc <- hanaml.AUC(data = data)

Output:


> auc$auc
 0.66

> auc$roc$Collect()

   ID  FPR  TPR
1   0  1.0  1.0
2   1  0.8  1.0
3   2  0.6  1.0
4   3  0.6  0.6
5   4  0.4  0.6
6   5  0.2  0.4
7   6  0.2  0.2
8   7  0.0  0.2
9   8  0.0  0.0