HANA Binning

Properties that can be configured for the Binning Preparation Component in HANA scenarios.

Syntax Binning also known as discretization, smooths a sorted data value. It divides the range of a numerical variable into sets of subranges called bins, and replaces each value with its bin number. Binning data before running certain algorithms, such as the decision tree algorithm, helps reduce the complexity of the model.
There are four binning methods:
  • Equal widths based on number of bins
  • Equal widths based on bin width
  • Equal depth
  • Deviation from mean
Also, there are three methods for smoothing:
  • Smoothing by bin means: each value in a bin is replaced by bin value of the mean.
  • Smoothing by bin medians: each bin value is replaced by the bin median.
  • Smoothing by bin boundaries: the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by its closest boundary value.
HANA Binning properties
Table 1: Data Preparation Component Properties
Property Description
Independent Column Select the input source column on which you want to perform binning.
Missing values Select the method for handling missing values.
Possible methods:
  • Ignore: The algorithm skips the records containing missing values in the independent or dependent columns.
  • Keep: Retains missing values.
Binning method Select the Binning Method.
Number of Bins Enter the number of bins needed.
Smoothing Method Select the Smoothing Method.
Binned Column Name Enter a name for the new column that contains bin numbers.
Smoothed Values Column Names Enter the name for the new column that contains smoothed values.

Example:

Binning of data in a dataset
City Temperature
Amsterdam 6
Frankfurt 12
Guangzhou 13
Cape Town 15
Waldorf 10
Bangalore 23
Mumbai 24
Miami 30
Rio De Janeiro 32
Sydney 25
Dubai 38
To bin the Temperature column by equal widths based on the number of widths and apply smoothing methods by means, perform the following steps:
  1. Drag the Binning component onto the analysis editor.
  2. Double click Binning, or hover the mouse on Binning and choose Configure Properties.
  3. In the Independent Column drop down list, select a column, for example, Temperature.
    Note You can only select columns that have numerical digit values.
  4. In Missing values drop down list, choose Ignore.
  5. In Binning Method, choose Equal widths based on the number of bins.
  6. In number of bins, enter 4.
  7. Select Smoothing Required.
  8. In Smoothing methods, choose Bin Mean.
  9. Under Enter name for newly added column, in Binned Column Name, enter Temperature Bin.
    Note You can name the column based on your preference or analysis requirement. This column contains the binned value.
  10. Under Enter name for newly added column, in Smoothed Values Column Names, enter Temperature Smooth.
    Note You can name the column based on your preference or analysis requirement. This column contains the smoothed value.
Output Table:
City Temperature Temperature Bin Temperature Smooth
Amsterdam 6 1 8.0
Frankfurt 12 2 13.33333
Guangzhou 13 2 13.33333
Cape Town 15 2 13.33333
Waldorf 10 1 8.0
Bangalore 23 3 25.5
Mumbai 24 3 25.5
Miami 30 3 25.5
Rio De Janeiro 32 4 35.0
Sydney 25 3 25.5
Dubai 38 4 35.0