confusion_matrix

hana_ml.algorithms.pal.metrics.confusion_matrix(data, key, label_true=None, label_pred=None, beta=None, native=False)

Computes confusion matrix to evaluate the accuracy of a classification.

Parameters
dataDataFrame

DataFrame containing the data.

keystr

Name of the ID column.

label_truestr, optional

Name of the original label column.

If not given, defaults to the second column.

label_predstr, optional

Name of the the predicted label column.

If not given, defaults to the third column.

betafloat, optional

Parameter used to compute the F-Beta score.

Defaults to 1.

nativebool, optional

Indicates whether to use native sql statements for confusion matrix calculation.

Defaults to True.

Returns
DataFrame
Confusion matrix, structured as follows:
  • Original label, with same name and data type as it is in data.

  • Predicted label, with same name and data type as it is in data.

  • Count, type INTEGER, the number of data points with the corresponding combination of predicted and original label.

The DataFrame is sorted by (original label, predicted label) in descending order.

Classification report table, structured as follows:
  • Class, type NVARCHAR(100), class name

  • Recall, type DOUBLE, the recall of each class

  • Precision, type DOUBLE, the precision of each class

  • F_MEASURE, type DOUBLE, the F_measure of each class

  • SUPPORT, type INTEGER, the support - sample number in each class

Examples

Data contains the original label and predict label df:

>>> df.collect()
   ID  ORIGINAL  PREDICT
0   1         1        1
1   2         1        1
2   3         1        1
3   4         1        2
4   5         1        1
5   6         2        2
6   7         2        1
7   8         2        2
8   9         2        2
9  10         2        2

Calculate the confusion matrix:

>>> cm, cr = confusion_matrix(data=df, key='ID', label_true='ORIGINAL', label_pred='PREDICT')

Output:

>>> cm.collect()
   ORIGINAL  PREDICT  COUNT
0         1        1      4
1         1        2      1
2         2        1      1
3         2        2      4
>>> cr.collect()
  CLASS  RECALL  PRECISION  F_MEASURE  SUPPORT
0     1     0.8        0.8        0.8        5
1     2     0.8        0.8        0.8        5