You can configure properties for the Normalization Preparation Component in HANA and
non-HANA scenarios.
Syntax Use this component to normalize
the attribute data. HANA Normalization scales the large value attribute data to
fall within a specific range, such as -1.0 to 1.0, or 0.0 to 1.0. You can use
this component for In-Database analysis. Normalization of data is useful for
classification algorithms involving neural networks, or distance measurements
such as nearest neighbor classification and clustering.
Note If you want the processed data to replace the
existing column, select
Replace
column.
The normalization component supports the following
normalization methods:
- Min-Max
normalization: Performs a linear transformation on the original
data values, and scales each value to fit in a specific range. While performing
the Min-Max normalization you can specify
New Maximum
value and
New Minimum
value. This normalization is helpful for ensuring that extreme
values are constrained within a fixed range.
Note
- New Maximum value
must be greater than
New Minimum
value.
- Z-score
normalization: Computed based on the
mean and
standard
deviation for each attribute. This normalization is useful to
determine whether a specific value is above or below average, and by how much.
- Decimal scaling
normalization: The decimal point of the values of each attribute
are moved according to its maximum absolute value.
Note You can select
Replace
column, if you want the normalized data to replace the existing
column data, on which normalization is performed.
Example:
Normalizing the time taken to cover a
certain distance.
Table:
|
Name |
Distance (in
meters) |
Time (in seconds) |
| Laura |
500 |
66 |
| Desy |
500 |
360 |
| Alex |
500 |
201 |
| John |
500 |
78 |
| Ted |
500 |
504 |
To normalize the time column using
Min-Max
normalization, perform the following steps:
- In the Predict view, from the Component List
choose Data Preparation tab.
- Drag
the
HANA
Normalization component onto the analysis editor or Double-click on
HANA Normalization.
- Double click HANA Normalization , or hover the mouse pointer on
HANA Normalization and choose Configure
Properties.
- Select
the columns you want to normalize.
Note You can only select columns with
numerical values.
For example, Time (in seconds).
- From Normalization Type drop down, choose Min-Max.
- Enter
values for the
New Maximum
and the
New
Minimum.
- Choose Done, and then choose Run.
Output table:
| Name
|
Distance (in
meters) |
Time (in
seconds) |
Time (in
seconds)_Normalized
|
| Laura |
500 |
66
|
0.05 |
| Desy |
500 |
360 |
0.30 |
| Alex |
500 |
201 |
0.17 |
| John |
500 |
78 |
0.06 |
| Ted |
500 |
504 |
0.42 |
Perform same steps for
Z-score
normalization and
Decimal
Scaling normalization as mentioned in
Min-Max
normalization. However, in case of
Z-score
normalization and
Decimal
Scaling normalization, you do not have enter the
New Maximum
and the
New Minimum
value.
Z-score
normalization output:
Output table:
| Name
|
Distance (in
meters) |
Time (in seconds)
|
| Laura |
500 |
-0.49 |
| Desy |
500 |
1.77 |
| Alex |
500 |
0.55 |
| John |
500 |
-0.40 |
| Ted |
500 |
2.88 |
Decimal Scaling
normalization output:
Output table:
| Name
|
Distance (in
meters) |
Time (in seconds)
|
| Laura |
500 |
0.01
|
| Desy |
500 |
0.04 |
| Alex |
500 |
0.02 |
| John |
500 |
0.01 |
| Ted |
500 |
0.05 |