hanaml.confusion.matrix.Rd
Compute confusion matrix to evaluate the accuracy of a classification.
hanaml.confusion.matrix(
data,
key,
label.true = NULL,
label.pred = NULL,
beta = NULL
)
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character, optional
Name of the original label column.
If not given, defaults to 1st non-ID columm.
character, optional)
Name of the predicted label column.
If not given, defaults to 2nd non-ID columm.
double, optional
Parameter used to compute the F-Beta score.
Defaults to 1.
Returns a list of DataFrames:
DataFrame 1
Confusion matrix, structured as follows:
Original label: with same name and data type as it is in data.
Predicted label: with same name and data type as it is in data.
Count: type INTEGER, the number of data points with the corresponding combination of predicted and original label.
The DataFrame is sorted by (original label, predicted label) in descending order. NOTE: The data type of the original.label column and predict.label column must be the same.
DataFrame 2
Classfication report, structured as follows:
Class: type NVARCHAR(100), class name
Recall: type DOUBLE, the recall of each class
Precision: type DOUBLE, the precision of each class
F_MEASURE: type DOUBLE, the F_measure of each class
SUPPORT: type INTEGER, the support - sample number in each class
DataFrame df to calculate the confusion matrix:
> df$Collect()
ID ORIGINAL PREDICT
1 1 1 1
2 2 1 1
3 3 1 1
4 4 1 2
5 5 1 1
6 6 2 2
7 7 2 1
8 8 2 2
9 9 2 2
10 10 2 2
Calculate the confusion matrix:
> cm, cr <- hanaml.confusion.matrix(data = df,
key = "ID", label.true = "ORIGINAL",
label.pred = "PREDICT")
Return:
> cm$Collect()
ORIGINAL PREDICT COUNT
1 1 1 4
2 1 2 1
3 2 1 1
4 2 2 4
> cr$Collect()
CLASS RECALL PRECISION F_MEASURE SUPPORT
1 1 0.8 0.8 0.8 5
2 2 0.8 0.8 0.8 5