hanaml.KMedian.Rdhanaml.KMedian is a R wrapper for SAP HANA PAL KMedian algorithm.
hanaml.KMedian( data, key, features = NULL, n.clusters, init = NULL, max.iter = NULL, tol = NULL, thread.ratio = NULL, distance.level = NULL, minkowski.power = NULL, category.weights = NULL, normalization = NULL, categorical.variable = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| n.clusters |
|
| init |
|
| max.iter |
|
| tol |
|
| thread.ratio |
|
| distance.level |
Defaults to "euclidean". |
| minkowski.power |
|
| category.weights |
|
| normalization |
Defaults to "no". |
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
A "KMedian" object with the following attributes:
labels : DataFrame
Label assigned to each sample.
cluster.centers : DataFrame
Coordinates of cluster centers.
The K-Medians clustering algorithm that partitions n observations into K clusters according to their nearest cluster center. It uses medians of each feature to calculate cluster centers.
Input DataFrame data:
> data$Collect()
ID V000 V001 V002
1 0 0.5 A 0.5
2 1 1.5 A 0.5
3 2 1.5 A 1.5
4 3 0.5 A 1.5
5 4 1.1 B 1.2
......
17 16 16.5 D 0.5
18 17 16.5 D 1.5
19 18 15.5 D 1.5
20 19 15.7 A 1.6
Call the function:
> kmedian <- hanaml.KMedian(data = data,
key = "ID",
n.clusters = 4,
init = "first_k",
max.iter = 100,
tol = 1.0E-6,
thread.ratio = 0.3,
distance.level = "euclidean",
category.weights = 0.5)
Output:
> kmedian$cluster.centers$Collect()
CLUSTER_ID V000 V001 V002
1 0 1.1 A 1.2
2 1 15.7 D 1.5
3 2 15.6 C 16.2
4 3 1.2 B 16.1
> kmedian$labels$Collect()
ID CLUSTER_ID DISTANCE
1 0 0 0.9219544
2 1 0 0.8062258
3 2 0 0.5000000
4 3 0 0.6708204
......
17 16 1 1.2806248
18 17 1 0.8000000
19 18 1 0.2000000
20 19 1 0.8071068