| hanaml.SOM {hana.ml.r} | R Documentation |
hanaml.SOM is a R wrapper for PAL Self-Organizing Maps algorithm.
hanaml.SOM(conn.context,
data = NULL,
key = NULL,
features = NULL,
tol = NULL,
normalization = NULL,
random.state = NULL,
height.of.map = NULL,
width.of.map = NULL,
kernel = NULL,
alpha = NULL,
learning.rate = NULL,
grid = NULL,
radius = NULL,
batch.som = NULL,
max.iter = NULL)
conn.context |
|
data |
|
key |
|
features |
|
tol |
|
normalization |
Defaults to "no". |
random.state |
Defaults to -1. |
height.of.map |
|
width.of.map |
|
kernel |
Defaults to "gaussain". |
alpha |
|
learning.rate |
Defaults to "exponential". |
grid |
Defaults to "hexagon". |
radius |
|
batch.som |
|
max.iter |
|
R6Class object.
A "SOM" object with the following attributes:
map : DataFrame
The map after training. The structure is as follows:
- 1st column: CLUSTER_ID, int. Unit cell ID.
- Other columns except the last one: FEATURE (in input data)
column with prefix "WEIGHT\_", float. Weight vectors used to simulate
the original tuples.
- Last column: COUNT, int. Number of original tuples that
every unit cell contains.
labels : DataFrame
The label of input data, the structure is as follows:
- 1st column: ID (in input table) data type, ID (in input table) column name
ID of original tuples.
- 2nd column: BMU, int. Best match unit (BMU).
- 3rd column: DISTANCE, float, The distance between the tuple and its BMU.
- 4th column: SECOND_BMU, int, Second BMU.
- 5th column: IS_ADJACENT. int.
Indicates whether the BMU and the second BMU are adjacent.
- 0: Not adjacent
- 1: Adjacent
model : DataFrame
The SOM model.
## Not run:
Input DataFrame for clustering:
> data$collect()
TRANS_ID V000 V001
1 0 0.10 0.20
2 1 0.22 0.25
3 2 0.30 0.40
4 3 0.40 0.50
5 4 0.50 1.00
6 5 1.10 15.10
7 6 2.20 11.20
8 7 1.30 15.30
9 8 1.40 15.40
10 9 3.50 15.90
11 10 13.10 1.10
12 11 16.20 1.50
13 12 16.30 1.30
14 13 12.40 2.40
15 14 16.90 1.90
16 15 49.00 40.10
17 16 50.10 50.20
18 17 50.20 48.30
19 18 55.30 50.40
20 19 50.40 56.50
> som <- hanaml.SOM(conn,
data,
tol = 1.0e-6,
normalization = "no",
random.state = 1,
height.of.map =4,
width.of.map = 4,
kernel = "gaussian",
learning.rate = "exponential",
grid = "hexagon",
batch.som = FALSE,
max.iter = 4000)
expected output:
> som$map$Collect()
CLUSTER_ID WEIGHT_V000 WEIGHT_V001 COUNT
1 0 52.8376884 53.4653266 2
2 1 50.1502513 49.2452261 2
3 2 18.5976067 27.1745897 0
4 3 1.2676711 15.2676711 3
5 4 49.0000211 40.1000986 1
6 5 33.4309941 34.4504915 0
7 6 3.4999807 15.8999720 1
8 7 2.2000010 11.2000508 1
9 8 24.8149919 11.7383922 0
10 9 20.6962530 8.3151278 0
11 10 9.8170045 6.4531495 0
12 11 1.1444060 4.2462051 0
13 12 16.4690145 1.5680112 3
14 13 12.7482412 1.7532663 2
15 14 3.8868516 0.8463565 0
16 15 0.3059696 0.4737271 5
## End(Not run)