R: Linear Discriminant Analysis

hanaml.DiscriminantAnalysis {hana.ml.r}

R Documentation

Linear Discriminant Analysis

Description

hanaml.DiscriminantAnalysis is a R wrapper for PAL Linear Discriminant Analysis.

Usage

hanaml.DiscriminantAnalysis (conn.context,
                             data = NULL,
                             key = NULL,
                             features = NULL,
                             label = NULL,
                             regularization.type = NULL,
                             regularization.amount = NULL,
                             projection = NULL)

Arguments

`conn.context`	`ConnectionContext` Connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character, optional` Name of the ID column. Defaults to the first column.
`features`	`character or list of characters, optional` Names of the feature columns. If not provided, it defaults to all non-ID, no-label columns.
`label`	`character` Name of the column in data that specifies the dependent variable. Defaults to the last column.
`regularization.type`	`character, optional` The strategy for handling ill-conditioning or rank-deficiency of the empirical covariance matrix. `'mixing'`: uses regularized covariance estimate. `'diag'`: uses diagonal covariance estimate. `'pseudo'`: uses pseudo inverse covariance estimate. Defaults to 'mixing'.
`regularization.amount`	`float, optional` The convex mixing weight assigned to the diagonal matrix obtained from diagonal of the empirical covriance matrix. Valid range for this parameter is (0,1) Valid only when regularization.type\| is 'mixing'. Defaults to the smallest number in (0,1) that makes the regularized emprical covariance matrix invertible.
`projection`	`logical, optional` Whether or not to compute the projection model. Defaults to TRUE.

Format

R6Class object.

Details

Linear discriminant analysis for classification and data reduction.

Value

basic.info DataFrame
Basic information of the training Data for linear discriminant analysis.
priors DataFrame
The empirical priors for each class in the training data.
coef DataFrame
Projection related info, such as standar deviations of the discriminants, variance proportaion to the total variance explained by each discriminant, etc.
proj.info DataFrame
Projection related info, such as standar deviations of the discriminants, variance proportaion to the total variance explained by each discriminant, etc.
proj.model DataFrame
The projection matrix and overall means for features.

Examples

## Not run: 
  The training DataFrame data:
 > data

   ID   X1   X2   X3   X4            CLASS
   0   5.1  3.5  1.4  0.2      Iris-setosa
   1   4.9  3.0  1.4  0.2      Iris-setosa
   2   4.7  3.2  1.3  0.2      Iris-setosa
   3   4.6  3.1  1.5  0.2      Iris-setosa
   4   5.0  3.6  1.4  0.2      Iris-setosa
   5   5.4  3.9  1.7  0.4      Iris-setosa
   ......
   24  6.5  3.0  5.8  2.2   Iris-virginica
   25  7.6  3.0  6.6  2.1   Iris-virginica
   26  4.9  2.5  4.5  1.7   Iris-virginica
   27  7.3  2.9  6.3  1.8   Iris-virginica
   28  6.7  2.5  5.8  1.8   Iris-virginica
   29  7.2  3.6  6.1  2.5   Iris-virginica

   Set up a 'DiscriminantAnalysis' object lda:

  >lda <- hanaml.DiscriminantAnalysis(conn.context,
                                      data
                                      key = 'ID',
                                      label = 'CLASS',
                                      regularization.type = "mixing",
                                      regularization.amount = 0.5,
                                      projection = TRUE)

  Expected output:

  > lda$coef$Collect()
                CLASS   COEFF_X1   COEFF_X2   COEFF_X3   COEFF_X4   INTERCEPT
   0      Iris-setosa  23.907391  51.754001 -34.641902 -49.063407 -113.235478
   1  Iris-versicolor   0.511034  15.652078  15.209568  -4.861018  -53.898190
   2   Iris-virginica -14.729636   4.981955  42.511486  12.315007  -94.143564

  > lda$proj.model$collect()
                NAME        X1        X2        X3        X4
   0  DISCRIMINANT_1  1.907978  2.399516 -3.846154 -3.112216
   1  DISCRIMINANT_2  3.046794 -4.575496 -2.757271  2.633037
   2    OVERALL_MEAN  5.843333  3.040000  3.863333  1.213333


## End(Not run)

[Package hana.ml.r version 1.0.8 Index]