Similar to other predict methods, this function Cluster assignment is a unified interface to call a cluster assignment algorithm to assign data to clusters that are previously generated by some clustering methods, including K-Means, Accelerated K-Means, K-Medians, K-Medoids, DBSCAN, SOM, and GMM. AgglomerateHierarchicalClustering does not provide predict function!

# S3 method for UnifiedClustering
predict(
  model,
  data,
  key = NULL,
  features = NULL,
  func = NULL,
  group.key = NULL
)

Format

S3 methods

Arguments

model

R6Class
A "hanaml.UnifiedClustering" object for prediction.

data

DataFrame
DataFrame containting the data.

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

features

character of list of characters, optional
Name of feature columns for prediction.
If not provided, it defaults to all non-key columns of data.

func

character, optional
The functionality for unified Clustering model.
Mandatory only when the func attribute of model is NULL.

  • "DBSCAN"

  • "GaussianMixture"

  • "AcceleratedKMeans"

  • "KMeans"

  • "KMedians"

  • "KMedoids"

  • "SOM"

  • "AffinityPropagation"

group.key

character, optional
The column of group key. This parameter is only valid when model$massive is TRUE.
Defaults to the first column of data if group.key is not provided.

Value

Predicted values are returned as a list of DataFrame.
DataFrame 1:

  • ID: column name.

  • CLUSTER_ID: Assigned cluster ID.

  • DISTANCE: Distance metric between a given point and the assigned cluster.

DataFrame 2:
Error message and only valid if massive is TRUE.

Examples

Input data for prediction:


> df.predict$Collect()
   ID  CLUSTER_ID  DISTANCE
1  88           3  0.981659
2  89           3  0.826454
3  90           2  1.990205
4  91           2  0.325812

Call the predict() function:


> res <- predict(model = ukmeans,
                 data = df.predict,
                 key = "ID",
                 func = "KMeans")

Check the result:


> res$Collect()
   ID  CLUSTER_ID  DISTANCE
1  88           3  0.981659
2  89           3  0.826454
3  90           2  1.990205
4  91           2  0.325812