hanaml.DiscriminantAnalysis.Rd
hanaml.DiscriminantAnalysis is a R wrapper for SAP HANA PAL Linear Discriminant Analysis.
hanaml.DiscriminantAnalysis(
data = NULL,
key = NULL,
features = NULL,
label = NULL,
regularization.type = NULL,
regularization.amount = NULL,
projection = NULL
)
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.
character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.
character, optional
The strategy for handling ill-conditioning or rank-deficiency
of the empirical covariance matrix.
'mixing'
: uses regularized covariance estimate.
'diag'
: uses diagonal covariance estimate.
'pseudo'
: uses pseudo inverse covariance estimate.
Defaults to 'mixing'.
double, optional
The convex mixing weight assigned to the diagonal matrix
obtained from diagonal of the empirical covriance matrix.
Valid range for this parameter is (0,1)
Valid only when regularization.type| is 'mixing'.
Defaults to the smallest number in (0,1) that makes the
regularized empirical covariance matrix invertible.
logical, optional
Whether or not to compute the projection model.
Defaults to TRUE.
Returns a "DiscriminantAnalysis" object with the following attributes:
basic.info DataFrame
Basic information of the training Data
for linear discriminant analysis.
priors DataFrame
The empirical priors for each class in the training data.
coef DataFrame
Projection related info, such as standard deviations of the discriminants,
variance proportion to the total variance explained by each discriminant, etc.
proj.info DataFrame
Projection related info, such as standard deviations of the discriminants,
variance proportion to the total variance explained by each discriminant, etc.
proj.model DataFrame
The projection matrix and overall means for features.
Linear discriminant analysis for classification and data reduction.
Input DataFrame data:
> data$Collect()
1 0 5.1 3.5 1.4 0.2 Iris-setosa
2 1 4.9 3.0 1.4 0.2 Iris-setosa
3 2 4.7 3.2 1.3 0.2 Iris-setosa
4 3 4.6 3.1 1.5 0.2 Iris-setosa
5 4 5.0 3.6 1.4 0.2 Iris-setosa
6 5 5.4 3.9 1.7 0.4 Iris-setosa
......
25 24 6.5 3.0 5.8 2.2 Iris-virginica
26 25 7.6 3.0 6.6 2.1 Iris-virginica
27 26 4.9 2.5 4.5 1.7 Iris-virginica
28 28 7.3 2.9 6.3 1.8 Iris-virginica
29 29 6.7 2.5 5.8 1.8 Iris-virginica
30 29 7.2 3.6 6.1 2.5 Iris-virginica
Call the function:
> lda <- hanaml.DiscriminantAnalysis(data
key = "ID",
label = "CLASS",
regularization.type = "mixing",
regularization.amount = 0.5,
projection = TRUE)
Output:
> lda$coef$Collect()
CLASS COEFF_X1 COEFF_X2 COEFF_X3 COEFF_X4 INTERCEPT
1 Iris-setosa 23.907391 51.754001 -34.641902 -49.063407 -113.235478
2 Iris-versicolor 0.511034 15.652078 15.209568 -4.861018 -53.898190
3 Iris-virginica -14.729636 4.981955 42.511486 12.315007 -94.143564
> lda$proj.model$Collect()
NAME X1 X2 X3 X4
1 DISCRIMINANT_1 1.907978 2.399516 -3.846154 -3.112216
2 DISCRIMINANT_2 3.046794 -4.575496 -2.757271 2.633037
3 OVERALL_MEAN 5.843333 3.040000 3.863333 1.213333