hanaml.KMedoid.Rdhanaml.KMedoid is a R wrapper for SAP HANA PAL KMedoids algorithm.
hanaml.KMedoid( data, key, features = NULL, n.clusters, init = NULL, max.iter = NULL, tol = NULL, thread.ratio = NULL, distance.level = NULL, minkowski.power = NULL, category.weights = NULL, normalization = NULL, categorical.variable = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| n.clusters |
|
| init |
Defaults to "patent". |
| max.iter |
|
| tol |
|
| thread.ratio |
|
| distance.level |
Defaults to "euclidean". |
| minkowski.power |
|
| category.weights |
|
| normalization |
Defaults to "no". |
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
A "KMedoid" object with the following attributes:
labels : DataFrame
Label assigned to each sample.
cluster.centers : DataFrame
Coordinates of cluster centers.
The K-Medoids clustering algorithm partitions n observations into K clusters according to their nearest cluster center. It uses medoids to calculate cluster centers. The K-Medoids algorithm is more robust in regards to noise and outliers.
Input DataFrame data:
> data$Collect()
ID V000 V001 V002
1 0 0.5 A 0.5
2 1 1.5 A 0.5
3 2 1.5 A 1.5
4 3 0.5 A 1.5
5 4 1.1 B 1.2
......
16 15 15.5 D 0.5
17 16 16.5 D 0.5
18 17 16.5 D 1.5
19 18 15.5 D 1.5
20 19 15.7 A 1.6
Call the function:
> kmed <- hanaml.KMedoid(data = data,
key = "ID",
n.clusters = 4,
init = "first_k",
max.iter = 100,
tol = 1.0E-6,
thread.ratio = 0.3,
distance.level = "Euclidean",
category.weights = 0.5)
Output:
> kmed$cluster.centers$Collect() CLUSTER_ID V000 V001 V002 1 0 1.5 A 1.5 2 1 15.5 D 1.5 3 2 15.5 C 16.5 4 3 1.5 B 16.5 >dkmed$labels$Collect() ID CLUSTER_ID DISTANCE 1 0 0 1.4142136 2 1 0 1.0000000 3 2 0 0.0000000 4 3 0 1.0000000 ...... 17 16 1 1.4142136 18 17 1 1.0000000 19 18 1 0.0000000 20 19 1 0.9307136