Properties that can be configured for the HANA DB Scan algorithm.
HANA DB Scan (Density-Based Spatial Clustering of Applications with Noise) is a density-based data clustering algorithm. It finds a number of clusters starting from the estimated density distribution of corresponding nodes.
DB Scan requires two parameters: scan radius (eps) and the minimum number of points required to form a cluster (minPts). The algorithm starts with an arbitrary starting point that has not been visited. This point's eps-neighborhood is retrieved, and if the number of points it contains is equal to or greater than minPts, a cluster is started. Otherwise, the point is labeled as noise. These two parameters are very important and are usually determined by user.
PAL provides a method to automatically determine these two parameters. You can choose to specify the parameters by yourself or let the system determine them for you.
| Property | Description |
|---|---|
| Output Mode | Select the mode in which you want to use the output of this algorithm. |
| Define Parameters Automatically | To enable the algorithm to determine the minimum points and the radius parameters automatically, select True; otherwise, False. |
| Features | Select input columns with which you want to perform the analysis. |
| Calculate Silhouette | Select this option to calculate silhouette values. Silhouette signifies the quality of clustering. The silhouette value 1 signifies that the clustering is good and 0 signifies that the clustering is bad. |
| Cluster Name | Enter a name for the new column that contains the cluster numbers for the given dataset (cluster). |
| Missing Values | Select the method for handling missing values. Possible
methods:
|
| Distance Measure | Select the option for computing the distance between items and cluster center. |
| Number of Threads | Enter the number of threads the algorithm should use for execution. The default value is 1. |