hanaml.SOM.Rdhanaml.SOM is an R wrapper for SAP HANA PAL Self-Organizing Maps algorithm.
hanaml.SOM( data = NULL, key = NULL, features = NULL, tol = NULL, normalization = NULL, random.state = NULL, height.of.map = NULL, width.of.map = NULL, kernel = NULL, alpha = NULL, learning.rate = NULL, grid = NULL, radius = NULL, batch.som = NULL, max.iter = NULL, decay = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| tol |
|
| normalization |
Defaults to "no". |
| random.state |
Defaults to -1. |
| height.of.map |
|
| width.of.map |
|
| kernel |
Defaults to "gaussian". |
| alpha |
|
| learning.rate |
Will be replaced by |
| grid |
Defaults to "hexagon". |
| radius |
|
| batch.som |
|
| max.iter |
|
| decay |
If both |
A "SOM" object with the following attributes:
map : DataFrame
The map after training. The structure is as follows:
1st column: CLUSTER_ID, int. Unit cell ID.
Other columns except the last one: FEATURE (in input data) column with prefix "WEIGHT_", float. Weight vectors used to simulate the original tuples.
Last column: COUNT, int. Number of original tuples that every unit cell contains.
labels : DataFrame
The label of input data, the structure is as follows:
1st column: ID (in input table) data type, ID (in input table) column name ID of original tuples.
2nd column: BMU, int. Best match unit (BMU).
3rd column: DISTANCE, float, The distance between the tuple and its BMU.
4th column: SECOND_BMU, int, Second BMU.
5th column: IS_ADJACENT. int. Indicates whether the BMU and the second BMU are adjacent.
0: Not adjacent
1: Adjacent
model : DataFrame
The SOM model.
Input DataFrame data:
> data$Collect() TRANS_ID V000 V001 1 0 0.10 0.20 2 1 0.22 0.25 3 2 0.30 0.40 4 3 0.40 0.50 ...... 16 15 49.00 40.10 17 16 50.10 50.20 18 17 50.20 48.30 19 18 55.30 50.40 20 19 50.40 56.50
Call the function:
> som <- hanaml.SOM(data,
key = "TRANS_ID",
tol = 1.0e-6,
normalization = "no",
random.state = 1,
height.of.map =4,
width.of.map = 4,
kernel = "gaussian",
learning.rate = "exponential",
grid = "hexagon",
batch.som = FALSE,
max.iter = 4000)
Output:
> som$map$Collect() CLUSTER_ID WEIGHT_V000 WEIGHT_V001 COUNT 1 0 52.8376884 53.4653266 2 2 1 50.1502513 49.2452261 2 3 2 18.5976067 27.1745897 0 4 3 1.2676711 15.2676711 3 5 4 49.0000211 40.1000986 1 6 5 33.4309941 34.4504915 0 7 6 3.4999807 15.8999720 1 8 7 2.2000010 11.2000508 1 9 8 24.8149919 11.7383922 0 10 9 20.6962530 8.3151278 0 11 10 9.8170045 6.4531495 0 12 11 1.1444060 4.2462051 0 13 12 16.4690145 1.5680112 3 14 13 12.7482412 1.7532663 2 15 14 3.8868516 0.8463565 0 16 15 0.3059696 0.4737271 5