hanaml.DiscriminantAnalysis is a R wrapper for SAP HANA PAL Linear Discriminant Analysis.

hanaml.DiscriminantAnalysis(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  regularization.type = NULL,
  regularization.amount = NULL,
  projection = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

features

character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.

label

character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.

regularization.type

character, optional
The strategy for handling ill-conditioning or rank-deficiency of the empirical covariance matrix.

  • 'mixing': uses regularized covariance estimate.

  • 'diag': uses diagonal covariance estimate.

  • 'pseudo': uses pseudo inverse covariance estimate.

Defaults to 'mixing'.

regularization.amount

double, optional
The convex mixing weight assigned to the diagonal matrix obtained from diagonal of the empirical covriance matrix. Valid range for this parameter is (0,1) Valid only when regularization.type| is 'mixing'.
Defaults to the smallest number in (0,1) that makes the regularized empirical covariance matrix invertible.

projection

logical, optional
Whether or not to compute the projection model.
Defaults to TRUE.

Value

Returns a "DiscriminantAnalysis" object with the following attributes:

  • basic.info DataFrame
    Basic information of the training Data for linear discriminant analysis.

  • priors DataFrame
    The empirical priors for each class in the training data.

  • coef DataFrame
    Projection related info, such as standard deviations of the discriminants, variance proportion to the total variance explained by each discriminant, etc.

  • proj.info DataFrame
    Projection related info, such as standard deviations of the discriminants, variance proportion to the total variance explained by each discriminant, etc.

  • proj.model DataFrame
    The projection matrix and overall means for features.

Details

Linear discriminant analysis for classification and data reduction.

Examples

Input DataFrame data:


> data$Collect()
1   0   5.1  3.5  1.4  0.2      Iris-setosa
2   1   4.9  3.0  1.4  0.2      Iris-setosa
3   2   4.7  3.2  1.3  0.2      Iris-setosa
4   3   4.6  3.1  1.5  0.2      Iris-setosa
5   4   5.0  3.6  1.4  0.2      Iris-setosa
6   5   5.4  3.9  1.7  0.4      Iris-setosa
    ......
25 24   6.5  3.0  5.8  2.2   Iris-virginica
26 25   7.6  3.0  6.6  2.1   Iris-virginica
27 26   4.9  2.5  4.5  1.7   Iris-virginica
28 28   7.3  2.9  6.3  1.8   Iris-virginica
29 29   6.7  2.5  5.8  1.8   Iris-virginica
30 29   7.2  3.6  6.1  2.5   Iris-virginica

Call the function:


> lda <- hanaml.DiscriminantAnalysis(data
                                     key = "ID",
                                     label = "CLASS",
                                     regularization.type = "mixing",
                                     regularization.amount = 0.5,
                                     projection = TRUE)

Output:


> lda$coef$Collect()
              CLASS   COEFF_X1   COEFF_X2   COEFF_X3   COEFF_X4   INTERCEPT
 1      Iris-setosa  23.907391  51.754001 -34.641902 -49.063407 -113.235478
 2  Iris-versicolor   0.511034  15.652078  15.209568  -4.861018  -53.898190
 3   Iris-virginica -14.729636   4.981955  42.511486  12.315007  -94.143564

> lda$proj.model$Collect()
              NAME        X1        X2        X3        X4
 1  DISCRIMINANT_1  1.907978  2.399516 -3.846154 -3.112216
 2  DISCRIMINANT_2  3.046794 -4.575496 -2.757271  2.633037
 3    OVERALL_MEAN  5.843333  3.040000  3.863333  1.213333