hanaml.GaussianMixture.Rdhanaml.GaussianMixture is a R wrapper for SAP HANA PAL Gaussian Mixture Model (GMM).
hanaml.GaussianMixture( data = NULL, key = NULL, features = NULL, n.components = NULL, init.param = NULL, init.centers = NULL, covariance.type = NULL, shared.covariance = NULL, thread.ratio = NULL, max.iter = NULL, category.weight = NULL, categorical.variable = NULL, error.tol = NULL, regularization = NULL, random.seed = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| n.components |
|
| init.param |
|
| init.centers |
|
| covariance.type |
Defaults to "full". |
| shared.covariance |
|
| thread.ratio |
|
| max.iter |
|
| category.weight |
|
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
| error.tol |
|
| regularization |
|
| random.seed |
Defaults to 0. |
Returns a "GaussianMixture" object with following values:
labels : DataFrame
Label assigned to each sample.
model : DataFrame
Model content.
stats : DataFrame
Statistic value.
Input DataFrame data:
> data$Collect() ID X1 X2 X3 0 0.10 0.10 1 1 0.11 0.10 1 2 0.10 0.11 1 3 0.11 0.11 1 4 0.12 0.11 1
Call the function:
> gmm <- hanaml.GaussianMixture(data = data,
key = "ID",
n.components = 2,
init.param = "k.means++",
covariance.type = "full",
shared.covariance = TRUE,
thread.ratio = 0,
max.iter = 100,
category.weight = 0.707,
error.tol = 2.5,
regularization = 2.5,
random.seed = 5)
Output:
> gmm$labels$Collect()
ID CLUSTER_ID PROBABILITY
1 0 0 1
2 1 0 1
3 2 0 0
4 3 0 0
5 4 0 0
6 0 1 0
7 1 1 0
8 2 1 1
9 3 1 1
10 4 1 1