Confusion Matrix — hanaml.confusion.matrix • hana.ml.r

Compute confusion matrix to evaluate the accuracy of a classification.

hanaml.confusion.matrix(
  data,
  key,
  label.true = NULL,
  label.pred = NULL,
  beta = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
key	`character` Name of the ID column.
label.true	`character, optional` Name of the original label column. If not given, defaults to 1st non-ID columm.
label.pred	`character, optional)` Name of the predicted label column. If not given, defaults to 2nd non-ID columm.
beta	`double, optional` Parameter used to compute the F-Beta score. Defaults to 1.

Value

Returns a list of DataFrame:

DataFrame 1
Confusion matrix, structured as follows:
- Original label: with same name and data type as it is in data.
- Predicted label: with same name and data type as it is in data.
- Count: type INTEGER, the number of data points with the corresponding combination of predicted and original label.
The DataFrame is sorted by (original label, predicted label) in descending order. NOTE: The data type of the original.label column and predict.label column must be the same.
DataFrame 2
Classfication report, structured as follows:
- Class: type NVARCHAR(100), class name
- Recall: type DOUBLE, the recall of each class
- Precision: type DOUBLE, the precision of each class
- F_MEASURE: type DOUBLE, the F_measure of each class
- SUPPORT: type INTEGER, the support - sample number in each class

Examples

DataFrame df to calculate the confusion matrix:

   > df$Collect()
      ID  ORIGINAL  PREDICT
   1   1         1        1
   2   2         1        1
   3   3         1        1
   4   4         1        2
   5   5         1        1
   6   6         2        2
   7   7         2        1
   8   8         2        2
   9   9         2        2
   10 10         2        2

Calculate the confusion matrix:

> cm, cr <- hanaml.confusion.matrix(data = df,
                                    key = "ID", label.true = "ORIGINAL",
                                    label.pred = "PREDICT")

Return:

> cm$Collect()
   ORIGINAL  PREDICT  COUNT
1         1        1      4
2         1        2      1
3         2        1      1
4         2        2      4

> cr$Collect()
  CLASS  RECALL  PRECISION  F_MEASURE  SUPPORT
1     1     0.8        0.8        0.8        5
2     2     0.8        0.8        0.8        5