Properties that can be configured for the Binning Preparation Component in HANA
scenarios.
Syntax Binning also known as discretization, smooths a sorted data value. It divides the range of a
numerical variable into sets of subranges called bins, and replaces each value with its
bin number. Binning data before running certain algorithms, such as the decision tree
algorithm, helps reduce the complexity of the model.
There are four binning methods:
- Equal widths based on number of bins
- Equal widths based on bin width
- Equal depth
- Deviation from mean
Also, there are three methods for smoothing:
- Smoothing by bin means: each value in a bin is replaced by bin value of the
mean.
- Smoothing by bin medians: each bin value is replaced by the bin median.
- Smoothing by bin boundaries: the minimum and maximum values in a given bin
are identified as the bin boundaries. Each bin value is then replaced by its
closest boundary value.
HANA Binning properties
Table 1:
Data Preparation Component Properties
| Property |
Description |
| Independent Column |
Select the input source column on which you want to perform
binning. |
| Missing values |
Select the method for handling missing values. Possible
methods: - Ignore: The algorithm skips the records containing
missing values in the independent or dependent columns.
- Keep: Retains missing values.
|
| Binning method |
Select the Binning Method. |
| Number of Bins |
Enter the number of bins needed. |
| Smoothing Method |
Select the Smoothing Method. |
| Binned Column Name |
Enter a name for the new column that contains bin
numbers. |
| Smoothed Values Column Names |
Enter the name for the new column that contains smoothed
values. |
Example:
Binning of data in a dataset
| City |
Temperature |
| Amsterdam |
6 |
| Frankfurt |
12 |
| Guangzhou |
13 |
| Cape Town |
15 |
| Waldorf |
10 |
| Bangalore |
23 |
| Mumbai |
24 |
| Miami |
30 |
| Rio De Janeiro |
32 |
| Sydney |
25 |
| Dubai |
38 |
To bin the Temperature column by equal widths based on the number of
widths and apply smoothing methods by means, perform the following steps:
- Drag the Binning component onto the analysis editor.
- Double click Binning, or hover the mouse on
Binning and choose Configure
Properties.
- In the Independent Column drop down list, select a
column, for example, Temperature.
Note You can only
select columns that have numerical digit values.
- In Missing values drop down list, choose
Ignore.
- In Binning Method, choose Equal widths based
on the number of bins.
- In number of bins, enter 4.
- Select Smoothing Required.
- In Smoothing methods, choose Bin Mean.
- Under Enter name for newly added column, in Binned Column
Name, enter Temperature Bin.
Note You can name the column based
on your preference or analysis requirement. This column contains the binned
value.
- Under Enter name for newly added column, in Smoothed Values Column
Names, enter Temperature Smooth.
Note You can name the column
based on your preference or analysis requirement. This column contains the
smoothed value.
Output Table:
| City |
Temperature |
Temperature Bin |
Temperature Smooth |
| Amsterdam |
6 |
1 |
8.0 |
| Frankfurt |
12 |
2 |
13.33333 |
| Guangzhou |
13 |
2 |
13.33333 |
| Cape Town |
15 |
2 |
13.33333 |
| Waldorf |
10 |
1 |
8.0 |
| Bangalore |
23 |
3 |
25.5 |
| Mumbai |
24 |
3 |
25.5 |
| Miami |
30 |
3 |
25.5 |
| Rio De Janeiro |
32 |
4 |
35.0 |
| Sydney |
25 |
3 |
25.5 |
| Dubai |
38 |
4 |
35.0 |