hanaml.FeatureNormalizer.Rd
hanaml.FeatureNormalizer is a R wrapper for SAP HANA PAL scale algorithm.
hanaml.FeatureNormalizer(
method = NULL,
data = NULL,
features = NULL,
key = NULL,
z.score.method = NULL,
new.max = NULL,
new.min = NULL,
thread.ratio = NULL,
division.by.zero.handler = NULL
)
{"min.max", "z.score", "decimal"}, optional
Invokes one of the following scaling methods:
"min.max"
- min.max normalization.
"z.score"
- z.score normalization.
"decimal"
- Decimal scaling normalization.
DataFrame
DataFrame containting the data.
character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.
character
Name of the ID column.
{"mean.standard", "mean.mean", "median.median"}, optional
Only valid when method
is 'z.score'.
'mean.standard'
- Mean-Standard deviation.
'mean.mean'
- mean.mean deviation.
'median.median'
- median.median absolute deviation.
double, optional
The new maximum value for min.max normalization.
Mandatory and valid only when method
is 'min.max'.
double, optional
The new minimum value for min.max normalization.
Mandatory and valid Only when method
is 'min.max'.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
c("ignore", "throw.error"), optional
Specifies what to do when encountering a division by zero.
"ignore": ignores the column when encountering a division by zero.
"throw.error": throws an error when encountering a division by zero.
Returns a "FeatureNormalizer" object with following values:
result : DataFrame
Scaled dataset from fit and fit_transform methods.
DATA_ID : name as shown in input DataFrame.
DATA_FEATURES : name as shown in input table column name.
model : DataFrame
Trained model content., structured as follows:
ID : Scaling model ID
MODEL_CONTENT : Binning model saved as JSON string. The table must be a column table. The minimum length of each unit (row) is 5000.
statistics : DataFrame
Statistic results, structured as follows:
STAT_NAME : statistic name.
STAT_VALUE : statistic value.
Class to Normalize input data and generate a scaling model using one of the three scaling methods: min.max normalization, z.score normalization and normalization in decimal scaling. The transform function can be used to perform transform on the given DataFrame.
Input DataFrame data:
> data$Collect()
ID X1 X2
1 0 6.0 9.0
2 1 12.1 8.3
3 2 13.5 15.3
4 3 15.4 18.7
5 4 10.2 19.8
Call the function:
fn <- hanaml.FeatureNormalizer(data = data,
key = "ID",
method="min.max",
new.max=1.0,
new.min=0.0)
Output:
> fn$result$Collect()
ID X1 X2
1 0 0.0000000 0.03317536
2 1 0.1865443 0.00000000
3 2 0.2293578 0.33175355
4 3 0.2874618 0.49289100
5 4 0.1284404 0.54502370
6 5 0.5290520 0.58293839
7 6 0.5626911 0.75829384
8 7 0.7522936 0.80568720
9 8 0.8103976 0.91469194
10 9 0.5993884 0.95734597
11 10 1.0000000 1.00000000
12 11 1.0000000 1.00000000