hanaml.PCA {hana.ml.r}R Documentation

principal component analysis (PCA)

Description

hanaml.PCA is a R wrapper for PAL PCA.

Usage

hanaml.PCA(conn.context, data, key, features = NULL,
           formula = NULL, scaling = NULL, thread.ratio = NULL,
           scores = NULL)

Arguments

conn.context

ConnectionContext
The connection to the SAP HANA system.

data

DataFrame
DataFrame containing the data.

key

character
Name of the ID column of data.

features

list of character, optional
Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.

formula

formula type, optional
Formula to be used for model generation. format = label~<feature_list> eg: formula=CATEGORY~V1+V2+V3 You can either give the formula, or a feature and label combination. Do not provide both.
Defaults to NULL.

scaling

logical, optional
If TRUE, scale variables to have unit variance before the analysis
takes place.
Defaults to FALSE.

thread.ratio

double, optional
Controls the proportion of available threads to use.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates up to all available threads. Values between 0 and 1
will use that percentage of available threads. Values outside this
range tell PAL to heuristically determine the number of threads to use.
No default value.

scores

logical, optional
If TRUE, output the scores on each principal component when fitting.
Defaults to FALSE.

Format

R6Class object.

Details

The principal component analysis procedure to reduce the dimensionality of multivariate data using Singular Value Decomposition.

Value

Return a "PCA" object with following values:

See Also

transform.PCA

Examples

## Not run: 
Input DataFrame df for training:
>df$Head(4)$Collect()
ID    X1    X2    X3    X4
0   1  12.0  52.0  20.0  44.0
1   2  12.0  57.0  25.0  45.0
2   3  12.0  54.0  21.0  45.0
3   4  13.0  52.0  21.0  46.0

>pca <- hanaml.PCA(conn.context = conn, data = df, key = "ID",
                   scaling=TRUE, thread.ratio=0.5, scores=TRUE)

Output:
>pca$loadings$Collect()
   COMPONENT_ID  LOADINGS_X1  LOADINGS_X2  LOADINGS_X3  LOADINGS_X4
0        Comp1     0.541547     0.321424     0.511941     0.584235
1        Comp2    -0.454280     0.728287     0.395819    -0.326429
2        Comp3    -0.171426    -0.600095     0.760875    -0.177673
3        Comp4    -0.686273    -0.078552    -0.048095     0.721489
> pca$loadings.stat$Collect()
 COMPONENT_ID        SD  VAR_PROP  CUM_VAR_PROP
0        Comp1  1.566624  0.613577      0.613577
1        Comp2  1.100453  0.302749      0.916327
2        Comp3  0.536973  0.072085      0.988412
3        Comp4  0.215297  0.011588      1.000000
> pca$scaling.stat$Collect()
  VARIABLE_ID       MEAN     SCALE
0            1  17.000000  5.039841
1            2  53.636364  1.689540
2            3  23.000000  2.000000
3            4  48.454545  4.655398

## End(Not run)


[Package hana.ml.r version 1.0.8 Index]