HANA K-Means

Properties that can be configured for the HANA K-Means algorithm.

Syntax Use this algorithm to cluster observations into groups of related observations without any prior knowledge of those relationships. The algorithm clusters observations into k groups, where k is provided as an input parameter. The algorithm then assigns each observation to clusters based on the proximity of the observation to the mean of the cluster. The process continues until the clusters converge.
Note
  • You might obtain a different cluster number for each cluster each time you execute the HANA K-Means algorithm. However, the observations in each cluster remain the same.
  • Creating models using the HANA K-Means algorithm is not supported.
HANA K-Means Properties
Table 1: Algorithm Properties
Property Description
Output Mode Select the mode in which you want to use the output of this algorithm.
Features Select the input columns with which you want to perform the analysis.
Category Columns Select the input columns, which you want to consider as category columns.
Categorical Weights Enter the categorical weights.
Calculate Silhouette Select this option to calculate silhouette values. Silhouette signifies the quality of clustering. The silhouette value 1 signifies that the clustering is good and 0 signifies that the clustering is bad.
Missing Values Select the method for handling missing values.
Possible methods:
  • Ignore: Algorithm skips the records containing missing values in the independent or dependent columns.
  • Keep: Algorithm retains the record containing missing values during calculation.
Number of Clusters Enter the number of groups for clustering. The default value is 5.
Cluster Name Enter a name for the newly created column that contains the cluster name.
Distance Enter a name for the newly created column that contains the distance of the clusters from their centroids' name.
Maximum Iterations Enter the number of iterations allowed for finding clusters. The default value is 100.
Center Calculation Method Select the method to be used for calculating initial cluster centers.
Distance Measure Enter the method for calculating the distance between the item and cluster centre.
Normalization Type Select the type of normalization.
Number of Threads Enter the number of threads that can be used for execution. The default value is 1.
Exit Threshold Enter the threshold value for exiting from the iterations. The default value is 0.000000001.