hanaml.PCA.Rd
hanaml.PCA is a R wrapper for SAP HANA PAL PCA.
hanaml.PCA(
data = NULL,
key = NULL,
features = NULL,
formula = NULL,
scaling = NULL,
thread.ratio = NULL,
scores.output = NULL
)
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.
formula type, optional
Formula to be used for model generation.
format = label~<feature_list>
e.g.: formula=CATEGORY~V1+V2+V3
You can either give the formula,
or a feature and label combination, but do not provide both.
Defaults to NULL.
logical, optional
If TRUE, scale variables to have unit variance before the analysis takes place.
Defaults to FALSE.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads. Values between 0 and 1 will use up to
that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads
used is then be heuristically determined.
Defaults to -1.
logical, optional
If TRUE, output the scores on each principal component when fitting.
Defaults to FALSE.
Returns an R6 object of class "PCA" with following attributes and methods:
Attributes
loadings : DataFrame
The weights by which each standardized original variable should be
multiplied when computing component scores.
loadings.stat : DataFrame
Loading statistics on each component
scores : DataFrame
The transformed variable values corresponding to each data point.
Set to NULL if scores is FALSE.
scaling.stat : DataFrame
Mean and scale values of each variable
model : list of DataFrames
The fitted model.
Methods
CreateModelState(model=NULL, algorithm=NULL, func=NULL, state.description="ModelState", force=FALSE)
Usage:
> pca <- hanaml.PCA(data=df, key="ID")
> pca$CreateModelState()
Arguments:
model: DataFrame
DataFrame containing the model for parsing.
Defaults to self$model
.
algorithm: character
Specifies the PAL algorithm associated with model
.
Defaults to self$pal.algorithm
.
func: character
Specifies the functionality for Unified Classification/Regression.
Valid only for object instance of R6Class "UnifiedClassification" or "UnifiedRegression".
Defaults to self$func
.
state.description: character
A summary string for the generated model state.
Defaults to "ModelState".
force: logic
Specifies whether or not the replace existing state for model
.
Defaults to FALSE.
After calling this method, an attribute state
that contains the parsed info for model
shall be assigned
to the corresponding R6 object.
DeleteModelState(state=NULL)
Usage:
Assuming we have trained a hanaml
model and created its model state, like the following:
> pca <- hanaml.PCA(data=df, key="ID")
> pca$CreateModelState()
After using the model state for real-time scoring, we can delete the state by calling:
> pca$DelateModelState()
Arguments:
state: DataFrame
DataFrame containing the state info.
Defaults to self$state
.
After calling this method, the specified model state shall be cleaned up and associated memory be released.
The principal component analysis procedure to reduce the dimensionality of multivariate data using Singular Value Decomposition.
Input DataFrame data:
> data$Head(4)$Collect()
ID X1 X2 X3 X4
1 1 12.0 52.0 20.0 44.0
2 2 12.0 57.0 25.0 45.0
3 3 12.0 54.0 21.0 45.0
4 4 13.0 52.0 21.0 46.0
Call the function:
> pca <- hanaml.PCA(data = data,
key = "ID",
scaling=TRUE,
thread.ratio=0.5,
scores.output=TRUE)
Output:
> pca$loadings$Collect()
COMPONENT_ID LOADINGS_X1 LOADINGS_X2 LOADINGS_X3 LOADINGS_X4
1 Comp1 0.541547 0.321424 0.511941 0.584235
2 Comp2 -0.454280 0.728287 0.395819 -0.326429
3 Comp3 -0.171426 -0.600095 0.760875 -0.177673
4 Comp4 -0.686273 -0.078552 -0.048095 0.721489
> pca$loadings.stat$Collect()
COMPONENT_ID SD VAR_PROP CUM_VAR_PROP
1 Comp1 1.566624 0.613577 0.613577
2 Comp2 1.100453 0.302749 0.916327
3 Comp3 0.536973 0.072085 0.988412
4 Comp4 0.215297 0.011588 1.000000
> pca$scaling.stat$Collect()
VARIABLE_ID MEAN SCALE
1 1 17.000000 5.039841
2 2 53.636364 1.689540
3 3 23.000000 2.000000
4 4 48.454545 4.655398