hanaml.Discretize.Rdhanaml.Discretize is a R wrapper for SAP HANA PAL Discretize.
hanaml.Discretize( data = NULL, key = NULL, features = NULL, binning.variable = NULL, strategy = NULL, smoothing = NULL, col.smoothing = NULL, n.bins = NULL, bin.size = NULL, n.sd = NULL, categorical.variable = NULL, save.model = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| binning.variable |
|
| strategy |
|
| smoothing |
Only applies for none-categorical attributes that do not get specified
smoothing method by parameter col.smoothing. |
| col.smoothing | :
or equivalently
Only applies for numerical attributes. |
| n.bins |
|
| bin.size |
|
| n.sd |
|
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
| save.model |
|
A "Discretize" object with the following attributes:
result: DataFrame
Discretize results, structured as follows:
ID : name as shown in input DataFrame
FEATURES : data smoothed respectively in each bins
assignment: DataFrame
Assignment results, structured as follows:
ID : data ID, name as shown in input DataFrame.
BIN_INDEX : bin index.
model: DataFrame
Model results, structured as follows:
ROW_INDEX : row index.
MODEL_CONTENT : model contents.
statistics: DataFrame
Statistic results, structured as follows:
STAT_NAME : statistic name.
STAT_VALUE : statistic value.
It is an enhanced version of binning function which can be applied to table with multiple columns. This function partitions table rows into multiple segments called bins, then applies smoothing methods in each bin of each column respectively.
Input DataFrame data:
ID ATT1 ATT2 ATT3 ATT4 1 1 10.0 100 1 A 2 2 10.1 101 1 A 3 3 10.2 100 1 A 4 4 10.4 103 1 A 5 5 10.3 100 1 A 6 6 40.0 400 4 C 7 7 40.1 402 4 B 8 8 40.2 400 4 B 9 9 40.4 402 4 B 10 10 40.3 400 4 A 11 11 90.0 900 2 C 12 12 90.1 903 1 B 13 13 90.2 901 2 B 14 14 90.4 900 1 B 15 15 90.3 900 1 B
Call the function and a "Discretize" object discretize is returned:
> discretize <- hanaml.Discretize(data,
key = "ID",
features = c("ATT1", "ATT2", "ATT3", "ATT4"),
binning.variable = "ATT1",
strategy = "uniform.number",
smoothing = "bin.boundaries",
col.smoothing = list(ATT2 = "bin.means"),
n.bins = 3,
categorical.variable = "ATT3")
Expected output:
> discretize$result$Collect() ID ATT1 ATT2 ATT3 ATT4 1 1 10.2 100.8 1 A 2 2 10.2 100.8 1 A 3 3 10.2 100.8 1 A 4 4 10.2 100.8 1 A 5 5 10.2 100.8 1 A 6 6 40.2 400.8 4 C 7 7 40.2 400.8 4 B 8 8 40.2 400.8 4 B 9 9 40.2 400.8 4 B 10 10 40.2 400.8 4 A 11 11 90.2 900.8 2 C 12 12 90.2 900.8 1 B 13 13 90.2 900.8 2 B 14 14 90.2 900.8 1 B 15 15 90.2 900.8 1 B