hanaml.AgglomerateHierarchical.Rdhanaml.AgglomerateHierarchical is a R wrapper for SAP HANA PAL Agglomerate Hierarchical Clustering algorithm.
hanaml.AgglomerateHierarchical( data, key, features = NULL, n.clusters = NULL, affinity = NULL, linkage = NULL, thread.ratio = NULL, distance.dimension = NULL, normalization = NULL, category.weights = NULL, categorical.variable = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| n.clusters |
|
| affinity |
Note that |
| linkage |
When linkage is "centroid.clustering", "median.clustering"
"ward", affinity must be set to "squared.euclidean" |
| thread.ratio |
|
| distance.dimension |
|
| normalization |
Defaults to "no". |
| category.weights |
|
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
A "AgglomerateHierarchical" object with the following attributes:
labels : DataFrame
label of each points, structed as follows:
1st column: ID (in input table) data type, ID, record ID.
2nd column: int, CLUSTER_ID, the range is from 0 to n.clusters - 1.
comb.process : DataFrame
structed as follows:
1st column: int, STAGE, cluster stage.
2nd column: ID (in input table) data type, LEFT_ + ID (in input table) column name, One of the clusters that is to be combined in one combine stage, name as its row number in the input data table. After the combining, the new cluster is named after the left one.
3rd column: ID (in input table) data type, RIGHT_ + ID (in input
table) column name,
The other cluster to be combined in the same combine stage, named
as its row number in the input data table.
4th column: float, DISTANCE. Distance between the two combined clusters.
Input DataFrame data:
> data$Collect() POINT X1 X2 X3 0 0 0.5 0.5 1 1 1 1.5 0.5 2 2 2 1.5 1.5 2 3 3 0.5 1.5 2 4 4 1.1 1.2 2 ...... 16 16 16.5 0.5 1 17 17 16.5 1.5 1 18 18 15.5 1.5 1 19 19 15.7 1.6 1
Call the function:
> AH <- hanaml.AgglomerateHierarchical(data = data,
key = "POINT",
n.clusters = 4,
affinity = "squared.euclidean",
inkage = "centroid.clustering",
thread.ratio = 0,
distance.dimension = 3,
normalization = "no",
category.weights = 0.1)
Output:
> AH$comb.process.tbl$collect()
STAGE LEFT_POINT RIGHT_POINT DISTANCE
1 1 18 19 0.0187
2 2 13 14 0.025
3 3 7 9 0.0437
4 4 2 4 0.0438
......
16 16 15 16 0.1085
17 17 0 15 1.0381
18 18 5 10 1.0425
19 19 0 5 1.5146
> AH$labels$collect()
POINT CLUSTER_ID
1 0 1
2 1 1
3 2 1
4 3 1
......
17 16 4
18 17 4
19 18 4
20 19 4