R: Feature Normalizer

hanaml.FeatureNormalizer {hana.ml.r}

R Documentation

Feature Normalizer

Description

hanaml.FeatureNormalizer is a R wrapper for PAL scale algorithm.

Usage

hanaml.FeatureNormalizer(conn.context, method = NULL, data = NULL,
                         features= NULL, key = NULL,
                         z.score.method = NULL, new.max = NULL,
                         new.min = NULL, thread.ratio = NULL,
                         division.by.zero.handler = NULL)

Arguments

`conn.context`	`ConnectionContext` The connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character` Name of the ID column of data.
`features`	`list of character, optional` Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.
`method`	`{'min.max', 'z.score', 'decimal'}, optional` Invokes one of the following scaling methods: `'min.max'` - min.max normalization. `'z.score'` - z.score normalization. `'decimal'` - Decimal scaling normalization.
`z.score.method`	`{'mean.standard', 'mean.mean', 'median.median'}, optional` Only valid when `method` is 'z.score'. `'mean.standard'` - Mean-Standard deviation. `'mean.mean'` - mean.mean deviation. `'median.median'` - median.median absolute deviation.
`new.max`	`double, optional` The new maximum value for min.max normalization. Only valid when `method` is 'min.max'.
`new.min`	`double, optional` The new minimum value for min.max normalization. Only valid when `method` is 'min.max'.
`thread.ratio`	`double, optional` Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Defaults to 0.
`division.by.zero.handler`	`c("ignore", "throw.error"), optional` Specifies what to do when encountering a division by zero. "ignore": ignores the column when encountering a division by zero. "throw.error": throws an error when encountering a division by zero.

Format

R6Class object.

Details

Class to Normalize input data and generate a scaling model using one of the three scaling methods: min.max normalization, z.score normalization and normalization in decimal scaling. The transform function can be used to perform transform on the given DataFrame.

Value

Return a "FeatureNormalizer" object with following values:

result : DataFrame
Scaled dataset from fit and fit_transform methods.
- DATA_ID: name as shown in input DataFrame. - DATA_FEATURES: name as shown in input table column name.
model : DataFrame
Trained model content., structured as follows: - ID: Scaling model ID - MODEL_CONTENT: Binning model saved as JSON string The table must be a column table. The minimum length of each unit (row) is 5000.
statistics : DataFrame
Statistic results, structured as follows: - STAT_NAME: statistic name. - STAT_VALUE: statistic value.

Examples

## Not run: 
Input DataFrame data for training:
 > data$Collect()
   ID   X1   X2
   1  0  6.0  9.0
   2  1 12.1  8.3
   3  2 13.5 15.3
   4  3 15.4 18.7
   5  4 10.2 19.8
Generating a feature normalizer model:
fn <- hanaml.FeatureNormalizer(conn, data = data, key = "ID",
                                method="min.max", new.max=1.0, new.min=0.0)
> fn$result$Collect()
  ID        X1         X2
  1   0 0.0000000 0.03317536
  2   1 0.1865443 0.00000000
  3   2 0.2293578 0.33175355
  4   3 0.2874618 0.49289100
  5   4 0.1284404 0.54502370
  6   5 0.5290520 0.58293839
  7   6 0.5626911 0.75829384
  8   7 0.7522936 0.80568720
  9   8 0.8103976 0.91469194
 10  9 0.5993884 0.95734597
 11 10 1.0000000 1.00000000
 12 11 1.0000000 1.00000000


## End(Not run)

[Package hana.ml.r version 1.0.8 Index]