HANA DB Scan

Properties that can be configured for the HANA DB Scan algorithm.

Syntax

HANA DB Scan (Density-Based Spatial Clustering of Applications with Noise) is a density-based data clustering algorithm. It finds a number of clusters starting from the estimated density distribution of corresponding nodes.

DB Scan requires two parameters: scan radius (eps) and the minimum number of points required to form a cluster (minPts). The algorithm starts with an arbitrary starting point that has not been visited. This point's eps-neighborhood is retrieved, and if the number of points it contains is equal to or greater than minPts, a cluster is started. Otherwise, the point is labeled as noise. These two parameters are very important and are usually determined by user.

PAL provides a method to automatically determine these two parameters. You can choose to specify the parameters by yourself or let the system determine them for you.

HANA DB Scan Properties
Table 1: Algorithm Properties
Property Description
Output Mode Select the mode in which you want to use the output of this algorithm.
Define Parameters Automatically To enable the algorithm to determine the minimum points and the radius parameters automatically, select True; otherwise, False.
Features Select input columns with which you want to perform the analysis.
Calculate Silhouette Select this option to calculate silhouette values. Silhouette signifies the quality of clustering. The silhouette value 1 signifies that the clustering is good and 0 signifies that the clustering is bad.
Cluster Name Enter a name for the new column that contains the cluster numbers for the given dataset (cluster).
Missing Values Select the method for handling missing values.
Possible methods:
  • Ignore: Algorithm skips the records containing missing values in the independent or dependent columns.
  • Keep: Algorithm retains the record containing missing values during calculation.
Distance Measure Select the option for computing the distance between items and cluster center.
Number of Threads Enter the number of threads the algorithm should use for execution. The default value is 1.