Self-Organizing Maps

hanaml.SOM is an R wrapper for SAP HANA PAL Self-Organizing Maps algorithm.

hanaml.SOM(
  data = NULL,
  key = NULL,
  features = NULL,
  tol = NULL,
  normalization = NULL,
  random.state = NULL,
  height.of.map = NULL,
  width.of.map = NULL,
  kernel = NULL,
  alpha = NULL,
  learning.rate = NULL,
  grid = NULL,
  radius = NULL,
  batch.som = NULL,
  max.iter = NULL,
  decay = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
key	`character` Name of ID column.
features	`character or list of characters, optional` Names of the features columns.
tol	`double, optional` If the largest difference of the successive maps is less than this value, the calculation is regarded as convergence, and SOM is completed consequently. Defaults to 1.0e-6.
normalization	`character, optional` Normalization type: `"no"`: no normalization. `"min.max"`: Transform to new rang: 0 to 1 `"z.score"`: Z-score normalization Defaults to "no".
random.state	`integer, optional` `-1`: Random `0`: Sets every weight to zero `Other value`: Uses this value as seed Defaults to -1.
height.of.map	`integer, optional` Indicates the height of the map. Defaults to 10.
width.of.map	`integer, optional` Indicates the width of the map. Defaults to 10.
kernel	`character, optional` Represents the neighborhood kernel function. `"gaussian"`: Gaussian kernel function `"bubble"`: Bubble/Flat kernel function. Defaults to "gaussian".
alpha	`double, optional` Specifies the learning rate. Defaults to 0.5
learning.rate	`character, optional(deprecated)` Indicates the decay function for learning rate: `"exponential"` `"linear"` Will be replaced by `decay` in future release. Defaults to "exponential".
grid	`character, optional` Indicates the shape of the grid. `"rectangle"` `"hexagon"` Defaults to "hexagon".
radius	`double, optional` Specifies the scan radius. Defaults to the bigger value of height.of.map and width.of.map.
batch.som	`logical, optional` Indicates whether batch SOM is carried out. Defaults to FALSE. Note that for batch SOM, kernel.function is always Gaussian, and the learning.rate factors take no effect. Defaults to FALSE.
max.iter	`integer, optional` Maximum number of iterations. Note that the training might not converge if this value is too small, for example, less than 1000. Defaults to 1000 plus 500 times the number of neurons in the lattice.
decay	`character, optional` Indicates the decay function for learning rate: `"exponential"` `"linear"` If both `learning.rate` and `decay` are set, `decay` takes precedence. Defaults to "exponential".

Value

A "SOM" object with the following attributes:

map : DataFrame
The map after training. The structure is as follows:
- 1st column: CLUSTER_ID, int. Unit cell ID.
- Other columns except the last one: FEATURE (in input data) column with prefix "WEIGHT_", float. Weight vectors used to simulate the original tuples.
- Last column: COUNT, int. Number of original tuples that every unit cell contains.
labels : DataFrame
The label of input data, the structure is as follows:
- 1st column: ID (in input table) data type, ID (in input table) column name ID of original tuples.
- 2nd column: BMU, int. Best match unit (BMU).
- 3rd column: DISTANCE, float, The distance between the tuple and its BMU.
- 4th column: SECOND_BMU, int, Second BMU.
- 5th column: IS_ADJACENT. int. Indicates whether the BMU and the second BMU are adjacent.
  - 0: Not adjacent
  - 1: Adjacent
model : DataFrame
The SOM model.

Examples

Input DataFrame data:

> data$Collect()
   TRANS_ID  V000  V001
1         0  0.10  0.20
2         1  0.22  0.25
3         2  0.30  0.40
4         3  0.40  0.50
......
16       15 49.00 40.10
17       16 50.10 50.20
18       17 50.20 48.30
19       18 55.30 50.40
20       19 50.40 56.50

Call the function:

> som <- hanaml.SOM(data,
                    key = "TRANS_ID",
                    tol = 1.0e-6,
                    normalization = "no",
                    random.state = 1,
                    height.of.map =4,
                    width.of.map = 4,
                    kernel = "gaussian",
                    learning.rate = "exponential",
                    grid = "hexagon",
                    batch.som = FALSE,
                    max.iter = 4000)

Output:

> som$map$Collect()
   CLUSTER_ID  WEIGHT_V000 WEIGHT_V001 COUNT
 1           0  52.8376884  53.4653266     2
 2           1  50.1502513  49.2452261     2
 3           2  18.5976067  27.1745897     0
 4           3   1.2676711  15.2676711     3
 5           4  49.0000211  40.1000986     1
 6           5  33.4309941  34.4504915     0
 7           6   3.4999807  15.8999720     1
 8           7   2.2000010  11.2000508     1
 9           8  24.8149919  11.7383922     0
 10          9  20.6962530   8.3151278     0
 11         10   9.8170045   6.4531495     0
 12         11   1.1444060   4.2462051     0
 13         12  16.4690145   1.5680112     3
 14         13  12.7482412   1.7532663     2
 15         14   3.8868516   0.8463565     0
 16         15   0.3059696   0.4737271     5

Arguments

Value

Examples

See also