HANA Self-Organizing Maps

Properties that can be configured for the HANA Self-Organizing Maps algorithm.

Syntax

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different from other artificial neural networks in that they use a neighborhood function to preserve the topological properties of the input space.

This makes SOMs useful for visualizing low-dimensional views of high-dimensional data, akin to multi-dimensional scaling. The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map. Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector.

The SOM approach has many applications, such as virtualization, web document clustering, and recognition of speech.

HANA Self-Organizing Maps Properties
Table 1: Algorithm Properties
Property Description
Map Height Enter the map height. The default value is 5.
Map Width Enter the map width. The default value is 5.
Alpha Enter a value for the learning rate. The default value is 0.5.
Map Shape Select the map shape.
Features Select input columns with which you want to perform the analysis.
Calculate Silhouette Select this option to calculate silhouette values. Silhouette signifies the quality of clustering. The silhouette value 1 signifies that the clustering is good and 0 signifies that the clustering is bad.
Cluster Name Enter a name for the new column that contains the cluster numbers for the given dataset.
Missing Values Select the method for handling missing values.
Possible methods:
  • Ignore: The algorithm skips the records containing missing values in the independent or dependent columns.
  • Keep: The algorithm retains the record containing missing values during calculation.
Normalization Type Select the type of normalization.
Possible types:
  • Normalization not required
  • New range normalization
  • Zero score normalization
Random Seed Enter a random number that you want to use to perform the calculation. If you enter -1, the algorithm selects a random number by itself for calculation. The default value is -1.
Maximum Iterations Enter the number of iterations you want the algorithm to use for finding clusters. The default value is 100.
Number of Threads Enter the number of threads that the algorithm should use during execution. The default value is 2.