hanaml.FeatureNormalizer is a R wrapper for SAP HANA PAL scale algorithm.

hanaml.FeatureNormalizer(
  method = NULL,
  data = NULL,
  features = NULL,
  key = NULL,
  z.score.method = NULL,
  new.max = NULL,
  new.min = NULL,
  thread.ratio = NULL,
  division.by.zero.handler = NULL
)

Arguments

method

{"min.max", "z.score", "decimal"}, optional
Invokes one of the following scaling methods:

  • "min.max" - min.max normalization.

  • "z.score" - z.score normalization.

  • "decimal" - Decimal scaling normalization.

data

DataFrame
DataFrame containting the data.

features

character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.

key

character
Name of the ID column.

z.score.method

{"mean.standard", "mean.mean", "median.median"}, optional
Only valid when method is 'z.score'.

  • 'mean.standard' - Mean-Standard deviation.

  • 'mean.mean' - mean.mean deviation.

  • 'median.median' - median.median absolute deviation.

new.max

double, optional
The new maximum value for min.max normalization. Mandatory and valid only when method is 'min.max'.

new.min

double, optional
The new minimum value for min.max normalization. Mandatory and valid Only when method is 'min.max'.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

division.by.zero.handler

c("ignore", "throw.error"), optional
Specifies what to do when encountering a division by zero.

  • "ignore": ignores the column when encountering a division by zero.

  • "throw.error": throws an error when encountering a division by zero.

Value

Returns a "FeatureNormalizer" object with following values:

  • result : DataFrame
    Scaled dataset from fit and fit_transform methods.

    • DATA_ID : name as shown in input DataFrame.

    • DATA_FEATURES : name as shown in input table column name.

  • model : DataFrame
    Trained model content., structured as follows:

    • ID : Scaling model ID

    • MODEL_CONTENT : Binning model saved as JSON string. The table must be a column table. The minimum length of each unit (row) is 5000.

  • statistics : DataFrame
    Statistic results, structured as follows:

    • STAT_NAME : statistic name.

    • STAT_VALUE : statistic value.

Details

Class to Normalize input data and generate a scaling model using one of the three scaling methods: min.max normalization, z.score normalization and normalization in decimal scaling. The transform function can be used to perform transform on the given DataFrame.

Examples

Input DataFrame data:


 > data$Collect()
   ID   X1   X2
 1  0  6.0  9.0
 2  1 12.1  8.3
 3  2 13.5 15.3
 4  3 15.4 18.7
 5  4 10.2 19.8
   

Call the function:

fn <- hanaml.FeatureNormalizer(data = data,
                               key = "ID",
                               method="min.max",
                               new.max=1.0,
                               new.min=0.0)

Output:


> fn$result$Collect()
   ID        X1         X2
1   0 0.0000000 0.03317536
2   1 0.1865443 0.00000000
3   2 0.2293578 0.33175355
4   3 0.2874618 0.49289100
5   4 0.1284404 0.54502370
6   5 0.5290520 0.58293839
7   6 0.5626911 0.75829384
8   7 0.7522936 0.80568720
9   8 0.8103976 0.91469194
10  9 0.5993884 0.95734597
11 10 1.0000000 1.00000000
12 11 1.0000000 1.00000000