| hanaml.Discretize {hana.ml.r} | R Documentation |
It is an enhanced version of binning function which can be applied to table with multiple columns. This function partitions table rows into multiple segments called bins, then applies smoothing methods in each bin of each column respectively.
Discretize(conn.context, data = NULL,
key = NULL, features = NULL,
binning.variable = NULL, strategy = NULL,
smoothing = NULL, col.smoothing = NULL, n.bins = NULL,
bin.size = NULL, n.sd = NULL, categorical.variable = NULL,
save.model = NULL)
conn.context |
|
data |
|
key |
|
features |
|
binning.variable |
|
strategy |
Binning methods:
|
smoothing |
Default overall smoothing methods:
No default value. |
col.smoothing |
: col.smoothing = list("ATT1" = "bin.means", "ATT2" = "bin.boundaries") or equivalently col.smoothing = c(ATT1 = "bin.eans", ATT2 = "bin.boundaries")
Only applies for numerical attributes. |
n.bins |
|
bin.size |
|
n.sd |
|
categorical.variable |
|
save.model |
Indicates whether the model is saved.
|
R6Class object.
A "Discretize" object with the following attributes:
result: DataFrame
Discretize results, structured as follows:
- ID: name as shown in input DataFrame.
- FEATURES : data smoothed respectively in each bins
assignment: DataFrame
Assignment results, structured as follows:
- ID: data ID, name as shown in input DataFrame.
- BIN_INDEX : bin index.
model: DataFrame
Model results, structured as follows:
- ROW_INDEX: row index.
- MODEL_CONTENT : model contents.
statistics: DataFrame
Statistic results, structured as follows:
- STAT_NAME: statistic name.
- STAT_VALUE: statistic value.
## Not run:
Input DataFrame data for training:
ID ATT1 ATT2 ATT3 ATT4
1 1 10.0 100 1 A
2 2 10.1 101 1 A
3 3 10.2 100 1 A
4 4 10.4 103 1 A
5 5 10.3 100 1 A
6 6 40.0 400 4 C
7 7 40.1 402 4 B
8 8 40.2 400 4 B
9 9 40.4 402 4 B
10 10 40.3 400 4 A
11 11 90.0 900 2 C
12 12 90.1 903 1 B
13 13 90.2 901 2 B
14 14 90.4 900 1 B
15 15 90.3 900 1 B
Model traning and a "Discretize" object discretize is returned:
> discretize <- hanaml.Discretize(conn, data, key = "ID",
features = c("ATT1", "ATT2", "ATT3",
"ATT4"),
binning.variable = "ATT1",
strategy = "uniform.number",
smoothing = "bin.boundaries",
col.smoothing = list(ATT2 = "bin.means"),
n.bins = 3, categorical.variable = "ATT3")
Expected output:
> discretize$result$Collect()
ID ATT1 ATT2 ATT3 ATT4
1 1 10.2 100.8 1 A
2 2 10.2 100.8 1 A
3 3 10.2 100.8 1 A
4 4 10.2 100.8 1 A
5 5 10.2 100.8 1 A
6 6 40.2 400.8 4 C
7 7 40.2 400.8 4 B
8 8 40.2 400.8 4 B
9 9 40.2 400.8 4 B
10 10 40.2 400.8 4 A
11 11 90.2 900.8 2 C
12 12 90.2 900.8 1 B
13 13 90.2 900.8 2 B
14 14 90.2 900.8 1 B
15 15 90.2 900.8 1 B
## End(Not run)