hanaml.AgglomerateHierarchical {hana.ml.r} | R Documentation |
hanaml.AgglomerateHierarchical is a R wrapper for PAL Agglomerate Hierarchical Clusteringalgorithm.
hanaml.AgglomerateHierarchical(conn.context, data, key, features = NULL, n.clusters = NULL, affinity = NULL, linkage = NULL, thread.ratio = NULL, distance.dimension = NULL, normalization = NULL, category.weights = NULL, categorical.variable = NULL)
conn.context |
|
data |
|
key |
|
features |
|
n.clusters |
|
affinity |
Note that |
linkage |
Note that |
thread.ratio |
|
distance.dimension |
|
normalization |
Defaults to 0. |
category.weights |
|
categorical.variable |
|
R6Class
object.
labels : DataFrame
label of each points, structed as follows:
- 1st column: ID (in input table) data type, ID, record ID.
- 2nd column: int, CLUSTER_ID, the range is from 0 to
n.clusters - 1.
comb.process : DataFrame
structed as follows:
- 1st column: int, STAGE, cluster stage.
- 2nd column: ID (in input table) data type, LEFT_ + ID (in input
table) column name,
One of the clusters that is to be combined in one combine stage,
name as its row number in the input data table.
After the combining, the new cluster is named after the left one.
- 3rd column: ID (in input table) data type, RIGHT_ + ID (in input
table) column name,
The other cluster to be combined in the same combine stage, named
as its row number in the input data table.
- 4th column: float, DISTANCE. Distance between the two combined
clusters.
## Not run: Input DataFrame data: > data$collect() POINT X1 X2 X3 0 0 0.5 0.5 1 1 1 1.5 0.5 2 2 2 1.5 1.5 2 3 3 0.5 1.5 2 4 4 1.1 1.2 2 5 5 0.5 15.5 2 6 6 1.5 15.5 3 7 7 1.5 16.5 3 8 8 0.5 16.5 3 9 9 1.2 16.1 3 10 10 15.5 15.5 3 11 11 16.5 15.5 4 12 12 16.5 16.5 4 13 13 15.5 16.5 4 14 14 15.6 16.2 4 15 15 15.5 0.5 4 16 16 16.5 0.5 1 17 17 16.5 1.5 1 18 18 15.5 1.5 1 19 19 15.7 1.6 1 Create Agglomerate Hierarchical Clustering instance: > AgglomerateHierarchical <- hanaml.AgglomerateHierarchical(conn.context = conn, data = data, key = "POINT", n.clusters = 4, affinity = 'squared.euclidean', inkage = 'centroid.clustering', thread.ratio = 0, distance.dimension = 3, normalization = "no", category.weights = 0.1) Expected output: > AgglomerateHierarchical$comb.process.tbl$collect() STAGE LEFT_POINT RIGHT_POINT DISTANCE 1 1 18 19 0.0187 2 2 13 14 0.025 3 3 7 9 0.0437 4 4 2 4 0.0438 5 5 2 3 0.0594 6 6 17 18 0.0594 7 7 6 7 0.0594 8 8 11 12 0.0625 9 9 11 13 0.0906 10 10 16 17 0.0922 11 11 6 8 0.0953 12 12 1 2 0.0953 13 13 0 1 0.1727 14 14 5 6 0.1727 15 15 10 11 0.175 16 16 15 16 0.1085 17 17 0 15 1.0381 18 18 5 10 1.0425 19 19 0 5 1.5146 > AgglomerateHierarchical$labels$collect() POINT CLUSTER_ID 1 0 1 2 1 1 3 2 1 4 3 1 5 4 1 6 5 2 7 6 2 8 7 2 9 8 2 10 9 2 11 10 3 12 11 3 13 12 3 14 13 3 15 14 3 16 15 4 17 16 4 18 17 4 19 18 4 20 19 4 ## End(Not run)