Feature Normalizer

hanaml.FeatureNormalizer is a R wrapper for SAP HANA PAL scale algorithm.

hanaml.FeatureNormalizer(
  method = NULL,
  data = NULL,
  features = NULL,
  key = NULL,
  z.score.method = NULL,
  new.max = NULL,
  new.min = NULL,
  thread.ratio = NULL,
  division.by.zero.handler = NULL
)

Arguments

method	`{'min.max', 'z.score', 'decimal'}, optional` Invokes one of the following scaling methods: `'min.max'` - min.max normalization. `'z.score'` - z.score normalization. `'decimal'` - Decimal scaling normalization.
data	`DataFrame` DataFrame containting the data.
features	`character of list of characters, optional` Name of feature columns. If not provided, it defaults all non-key, non-label columns of data.
key	`character` Name of the ID column.
z.score.method	`{'mean.standard', 'mean.mean', 'median.median'}, optional` Only valid when `method` is 'z.score'. `'mean.standard'` - Mean-Standard deviation. `'mean.mean'` - mean.mean deviation. `'median.median'` - median.median absolute deviation.
new.max	`double, optional` The new maximum value for min.max normalization. Mandatory and valid only when `method` is 'min.max'.
new.min	`double, optional` The new minimum value for min.max normalization. Mandatory and valid Only when `method` is 'min.max'.
thread.ratio	`double, optional` Controls the proportion of available threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored. Defaults to 0.
division.by.zero.handler	`c("ignore", "throw.error"), optional` Specifies what to do when encountering a division by zero. "ignore": ignores the column when encountering a division by zero. "throw.error": throws an error when encountering a division by zero.

Value

Returns a "FeatureNormalizer" object with following values:

result : DataFrame
Scaled dataset from fit and fit_transform methods.
- DATA_ID: name as shown in input DataFrame.
- DATA_FEATURES: name as shown in input table column name.
model : DataFrame
Trained model content., structured as follows:
- ID : Scaling model ID
- MODEL_CONTENT : Binning model saved as JSON string. The table must be a column table. The minimum length of each unit (row) is 5000.
statistics : DataFrame
Statistic results, structured as follows:
- STAT_NAME : statistic name.
- STAT_VALUE : statistic value.

Details

Class to Normalize input data and generate a scaling model using one of the three scaling methods: min.max normalization, z.score normalization and normalization in decimal scaling. The transform function can be used to perform transform on the given DataFrame.

Examples

Input DataFrame data:

 > data$Collect()
     ID   X1   X2
   1  0  6.0  9.0
   2  1 12.1  8.3
   3  2 13.5 15.3
   4  3 15.4 18.7
   5  4 10.2 19.8

Call the function:

fn <- hanaml.FeatureNormalizer(data = data,
                               key = "ID",
                               method="min.max",
                               new.max=1.0,
                               new.min=0.0)

Output:

> fn$result$Collect()
     ID        X1         X2
  1   0 0.0000000 0.03317536
  2   1 0.1865443 0.00000000
  3   2 0.2293578 0.33175355
  4   3 0.2874618 0.49289100
  5   4 0.1284404 0.54502370
  6   5 0.5290520 0.58293839
  7   6 0.5626911 0.75829384
  8   7 0.7522936 0.80568720
  9   8 0.8103976 0.91469194
  10  9 0.5993884 0.95734597
  11 10 1.0000000 1.00000000
  12 11 1.0000000 1.00000000

Arguments

Value

Details

Examples

See also