Unified Clustering

hanaml.UnifiedClustering is an R wrapper for SAP HANA PAL Unified Clustering.

hanaml.UnifiedClustering(
  data = NULL,
  func = NULL,
  key = NULL,
  features = NULL,
  ...
)

Arguments

data	`DataFrame` DataFrame containting the data.
func	`character` The functionality for unified Clustering. Valid values are as follows: "AgglomerateHierarchicalClustering", "DBSCAN", "GaussianMixture", "AcceleratedKMeans", "KMeans", "KMedians", "KMedoids", "SOM".
key	`character` Name of the ID column.
features	`character or list of characters, optional` Names of features columns. If is not provided, it defaults to all non-key columns of `data`.
...	Specifies other parameters for training a clustering model with the functionality specified in func. Please see the documentation of corresponding functionalities for more detail. `hanaml.AgglomerateHierarchical, hanaml.DBSCAN, hanaml.GaussianMixture, hanaml.KMeans, hanaml.KMedian, hanaml.KMedoid, hanaml.SOM`

Value

Returns a "UnifiedClustering" object with the following attributes and methods:

labels DataFrame

DATA_ID - ID column in the input data.
CLUSTER_ID - The assigned cluster ID.
DISTANCE - Distance between a given point and the cluster center (k-means) nearest core object (DBSCAN) weight vector (SOM) Or probability of a given point belonging to the corresponding cluster (GMM).
SLIGHT_SILHOUETTE - Estimated value (slight silhouette).

centers DataFrame

CLUSTER_ID
VARIABLE_NAME - The name of variable.
VALUE - The value of variable.

model DataFrame

ROW_INDEX - model row index.
PART_INDEX - Specifically for GMM's CLUSTER_ID.
MODEL_CONTENT - model content.

statistics DataFrame

STAT_NAME - Statistics name.
STAT_VALUE - Statistics value.

optimal.param DataFrame

PARM_NAME - parameter name.
INT_VALUE - integer value.
DOUBLE_VALUE - double value.
STRING_VALUE - character value.

Examples

The training data:

 > data.fit$Collect()
     ID  V000 V001  V002
 1    0   0.5    A   0.5
 2    1   1.5    A   0.5
 3    2   1.5    A   1.5
 4    3   0.5    A   1.5
 5    4   1.1    B   1.2
 6    5   0.5    B  15.5
 7    6   1.5    B  15.5
 8    7   1.5    B  16.5
 9    8   0.5    B  16.5
 10   9   1.2    C  16.1
 11  10  15.5    C  15.5
 12  11  16.5    C  15.5
 13  12  16.5    C  16.5
 14  13  15.5    C  16.5
 15  14  15.6    D  16.2
 16  15  15.5    D   0.5
 17  16  16.5    D   0.5
 18  17  16.5    D   1.5
 19  18  15.5    D   1.5
 20  19  15.7    A   1.6

Create a UnifiedClustering model for Kmeans:

ukmeans <- hanaml.UnifiedClustering(data = data.fit,
                                    n.clusters=4,
                                    init="first.k",
                                    max.iter=100,
                                    tol=1.0E-6,
                                    thread.ratio=1.0,
                                    distance.level="Euclidean",
                                    category.weights=0.5)

Check the labels:

> ukmeans$labels$Collect()
    ID  CLUSTER_ID  DISTANCE  SLIGHT_SILHOUETE
1    0           0  0.891088          0.944370
2    1           0  0.863917          0.942478
3    2           0  0.806252          0.946288
4    3           0  0.835684          0.944942
......
17  16           1  0.976885          0.939386
18  17           1  0.818178          0.945878
19  18           1  0.722799          0.952170
20  19           1  1.102342          0.925679

Arguments

Value

Examples

See also