Self-Organizing Maps

hanaml.SOM is an R wrapper for SAP HANA PAL Self-Organizing Maps algorithm.

hanaml.SOM(
  data = NULL,
  key = NULL,
  features = NULL,
  tol = NULL,
  normalization = NULL,
  random.state = NULL,
  height.of.map = NULL,
  width.of.map = NULL,
  kernel = NULL,
  alpha = NULL,
  learning.rate = NULL,
  grid = NULL,
  radius = NULL,
  batch.som = NULL,
  max.iter = NULL,
  decay = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of ID column.

features

character or list of characters, optional
Names of the features columns.

tol

double, optional
If the largest difference of the successive maps is less than this value, the calculation is regarded as convergence, and SOM is completed consequently.
Defaults to 1.0e-6.

normalization

character, optional
Normalization type:

"no": no normalization.
"min.max": Transform to new rang: 0 to 1
"z.score": Z-score normalization

Defaults to "no".

random.state

integer, optional

-1: Random
0: Sets every weight to zero
Other value: Uses this value as seed

Defaults to -1.

height.of.map

integer, optional
Indicates the height of the map.
Defaults to 10.

width.of.map

integer, optional
Indicates the width of the map.
Defaults to 10.

kernel

character, optional
Represents the neighborhood kernel function.

"gaussian": Gaussian kernel function
"bubble": Bubble/Flat kernel function.

Defaults to "gaussian".

alpha

double, optional
Specifies the learning rate.
Defaults to 0.5

learning.rate

character, optional(deprecated)
Indicates the decay function for learning rate:

"exponential"
"linear"

Will be replaced by decay in future release.
Defaults to "exponential".

grid

character, optional
Indicates the shape of the grid.

"rectangle"
"hexagon"

Defaults to "hexagon".

radius

double, optional
Specifies the scan radius.
Defaults to the bigger value of height.of.map and width.of.map.

batch.som

logical, optional
Indicates whether batch SOM is carried out.
Defaults to FALSE. Note that for batch SOM, kernel.function is always Gaussian, and the learning.rate factors take no effect.
Defaults to FALSE.

max.iter

integer, optional
Maximum number of iterations.
Note that the training might not converge if this value is too small, for example, less than 1000.
Defaults to 1000 plus 500 times the number of neurons in the lattice.

decay

character, optional
Indicates the decay function for learning rate:

"exponential"
"linear"

If both learning.rate and decay are set, decay takes precedence.
Defaults to "exponential".

Value

An R6 object of class "SOM", with the following attributes and methods:
Attributes

map : DataFrame
The map after training. The structure is as follows:
- 1st column: CLUSTER_ID, int. Unit cell ID.
- Other columns except the last one: FEATURE (in input data) column with prefix "WEIGHT_", float. Weight vectors used to simulate the original tuples.
- Last column: COUNT, int. Number of original tuples that every unit cell contains.
labels : DataFrame
The label of input data, the structure is as follows:
- 1st column: ID (in input table) data type, ID (in input table) column name ID of original tuples.
- 2nd column: BMU, int. Best match unit (BMU).
- 3rd column: DISTANCE, float, The distance between the tuple and its BMU.
- 4th column: SECOND_BMU, int, Second BMU.
- 5th column: IS_ADJACENT. int. Indicates whether the BMU and the second BMU are adjacent.
  - 0: Not adjacent
  - 1: Adjacent
model : DataFrame
The SOM model.

Methods

CreateModelState(model=NULL, algorithm=NULL, func=NULL, state.description="ModelState", force=FALSE)

Usage:


   > som <- hanaml.SOM(data=df, key="ID")
   > som$CreateModelState()

Arguments:

model: DataFrame
DataFrame containing the model for parsing.
Defaults to self$model.
algorithm: character
Specifies the PAL algorithm associated with model.
Defaults to self$pal.algorithm.
func: character
Specifies the functionality for Unified Classification/Regression.
Valid only for object instance of R6Class "UnifiedClassification" or "UnifiedRegression".
Defaults to self$func.
state.description: character
A summary string for the generated model state.
Defaults to "ModelState".
force: logic
Specifies whether or not the replace existing state for model.
Defaults to FALSE.

After calling this method, an attribute state that contains the parsed info for model shall be assigned to the corresponding R6 object.

DeleteModelState(state=NULL)

Usage:
Assuming we have trained a hanaml model and created its model state, like the following:


   > som <- hanaml.SOM(data=df, key="ID")
   > som$CreateModelState()

After using the model state for real-time scoring, we can delete the state by calling:


   > som$DelateModelState()

Arguments:

state: DataFrame
DataFrame containing the state info.
Defaults to self$state.

After calling this method, the specified model state shall be cleaned up and associated memory be released.

Examples

Input DataFrame data:


> data$Collect()
   TRANS_ID  V000  V001
1         0  0.10  0.20
2         1  0.22  0.25
3         2  0.30  0.40
4         3  0.40  0.50
......
16       15 49.00 40.10
17       16 50.10 50.20
18       17 50.20 48.30
19       18 55.30 50.40
20       19 50.40 56.50

Call the function:


> som <- hanaml.SOM(data,
                    key = "TRANS_ID",
                    tol = 1.0e-6,
                    normalization = "no",
                    random.state = 1,
                    height.of.map =4,
                    width.of.map = 4,
                    kernel = "gaussian",
                    learning.rate = "exponential",
                    grid = "hexagon",
                    batch.som = FALSE,
                    max.iter = 4000)

Output:


> som$map$Collect()
   CLUSTER_ID  WEIGHT_V000 WEIGHT_V001 COUNT
 1           0  52.8376884  53.4653266     2
 2           1  50.1502513  49.2452261     2
 3           2  18.5976067  27.1745897     0
 4           3   1.2676711  15.2676711     3
 5           4  49.0000211  40.1000986     1
 6           5  33.4309941  34.4504915     0
 7           6   3.4999807  15.8999720     1
 8           7   2.2000010  11.2000508     1
 9           8  24.8149919  11.7383922     0
 10          9  20.6962530   8.3151278     0
 11         10   9.8170045   6.4531495     0
 12         11   1.1444060   4.2462051     0
 13         12  16.4690145   1.5680112     3
 14         13  12.7482412   1.7532663     2
 15         14   3.8868516   0.8463565     0
 16         15   0.3059696   0.4737271     5

Arguments

Value

Examples

See also