Compute confusion matrix to evaluate the accuracy of a classification.

hanaml.confusion.matrix(
  data,
  key,
  label.true = NULL,
  label.pred = NULL,
  beta = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

label.true

character, optional
Name of the original label column.
If not given, defaults to 1st non-ID columm.

label.pred

character, optional)
Name of the predicted label column.
If not given, defaults to 2nd non-ID columm.

beta

double, optional
Parameter used to compute the F-Beta score.
Defaults to 1.

Value

Returns a list of DataFrames:

  • DataFrame 1
    Confusion matrix, structured as follows:

    • Original label: with same name and data type as it is in data.

    • Predicted label: with same name and data type as it is in data.

    • Count: type INTEGER, the number of data points with the corresponding combination of predicted and original label.

    The DataFrame is sorted by (original label, predicted label) in descending order. NOTE: The data type of the original.label column and predict.label column must be the same.

  • DataFrame 2
    Classfication report, structured as follows:

    • Class: type NVARCHAR(100), class name

    • Recall: type DOUBLE, the recall of each class

    • Precision: type DOUBLE, the precision of each class

    • F_MEASURE: type DOUBLE, the F_measure of each class

    • SUPPORT: type INTEGER, the support - sample number in each class

Examples

DataFrame df to calculate the confusion matrix:


   > df$Collect()
      ID  ORIGINAL  PREDICT
   1   1         1        1
   2   2         1        1
   3   3         1        1
   4   4         1        2
   5   5         1        1
   6   6         2        2
   7   7         2        1
   8   8         2        2
   9   9         2        2
   10 10         2        2

Calculate the confusion matrix:


> cm, cr <- hanaml.confusion.matrix(data = df,
                                    key = "ID", label.true = "ORIGINAL",
                                    label.pred = "PREDICT")

Return:


> cm$Collect()
   ORIGINAL  PREDICT  COUNT
1         1        1      4
2         1        2      1
3         2        1      1
4         2        2      4

> cr$Collect()
  CLASS  RECALL  PRECISION  F_MEASURE  SUPPORT
1     1     0.8        0.8        0.8        5
2     2     0.8        0.8        0.8        5