hanaml.UnifiedClustering.Rdhanaml.UnifiedClustering is an R wrapper for SAP HANA PAL Unified Clustering.
hanaml.UnifiedClustering( data = NULL, func = NULL, key = NULL, features = NULL, ... )
| data |
|
|---|---|
| func |
|
| key |
|
| features |
|
| ... |
|
Returns a "UnifiedClustering" object with the following attributes and methods:
labels DataFrame
DATA_ID - ID column in the input data.
CLUSTER_ID - The assigned cluster ID.
DISTANCE - Distance between a given point and the cluster center (k-means)
nearest core object (DBSCAN) weight vector (SOM) Or probability
of a given point belonging to the corresponding cluster (GMM).
SLIGHT_SILHOUETTE - Estimated value (slight silhouette).
centers DataFrame
CLUSTER_ID
VARIABLE_NAME - The name of variable.
VALUE - The value of variable.
model DataFrame
ROW_INDEX - model row index.
PART_INDEX - Specifically for GMM's CLUSTER_ID.
MODEL_CONTENT - model content.
statistics DataFrame
STAT_NAME - Statistics name.
STAT_VALUE - Statistics value.
optimal.param DataFrame
PARM_NAME - parameter name.
INT_VALUE - integer value.
DOUBLE_VALUE - double value.
STRING_VALUE - character value.
The training data:
> data.fit$Collect()
ID V000 V001 V002
1 0 0.5 A 0.5
2 1 1.5 A 0.5
3 2 1.5 A 1.5
4 3 0.5 A 1.5
5 4 1.1 B 1.2
6 5 0.5 B 15.5
7 6 1.5 B 15.5
8 7 1.5 B 16.5
9 8 0.5 B 16.5
10 9 1.2 C 16.1
11 10 15.5 C 15.5
12 11 16.5 C 15.5
13 12 16.5 C 16.5
14 13 15.5 C 16.5
15 14 15.6 D 16.2
16 15 15.5 D 0.5
17 16 16.5 D 0.5
18 17 16.5 D 1.5
19 18 15.5 D 1.5
20 19 15.7 A 1.6
Create a UnifiedClustering model for Kmeans:
ukmeans <- hanaml.UnifiedClustering(data = data.fit, n.clusters=4, init="first.k", max.iter=100, tol=1.0E-6, thread.ratio=1.0, distance.level="Euclidean", category.weights=0.5)
Check the labels:
> ukmeans$labels$Collect()
ID CLUSTER_ID DISTANCE SLIGHT_SILHOUETE
1 0 0 0.891088 0.944370
2 1 0 0.863917 0.942478
3 2 0 0.806252 0.946288
4 3 0 0.835684 0.944942
......
17 16 1 0.976885 0.939386
18 17 1 0.818178 0.945878
19 18 1 0.722799 0.952170
20 19 1 1.102342 0.925679