QuantileTransform

class hana_ml.algorithms.pal.preprocessing.QuantileTransform(num_quantiles=None, output_distribution=None)

Python wrapper for PAL Quantile Transformer.

Parameters:
num_quantilesint, optional

Specifies the number of quantiles to be computed.

Defaults to 100.

output_distribution{'uniform', 'normal'}, optional

Specifies the marginal distribution of the quantile-transformed data.

  • 'uniform': Uniform distribution

  • 'normal': normal distribution

Defaults to 'uniform'.

Examples

Input data for applying quantile transform to:

>>> data.collect()
    ID   X1   X2    X3 X4  X5     X6
0    0  0.2  1.0   5.0  A   0   10.0
1    1  1.0  2.0  10.0  B   1  200.0
2    2  2.3  2.0  11.0  C   2  120.0
3    3  0.5  2.0  12.0  A   0  150.0
4    4  1.2  3.0  15.0  C   2  154.0
5    5  3.0  4.0  15.0  A   0  130.0
6    6  2.5  4.0  24.0  C   2  200.0
7    7  1.2  4.0  26.0  B   1   50.0
8    9  1.1  5.0  28.0  C   2  323.0
9   10  0.1  5.0  28.0  C   2  500.0
10  11  0.7  5.0  28.0  A   0  120.0
11  12  0.9  5.0  28.0  C   2  300.0
12  13  0.2  5.0  30.0  A   0  400.0
13  14  0.0  6.0  30.0  B   1  430.0

Create a quantile transformer and fit the training data:

>>> qt = QuantileTransform(num_quantiles=200,
...                        output_distribution='uniform')
>>> qt.fit(data=data, key='ID', features=['X2', 'X6'],
           categorical_variable='X5')

See the quantile-transformed training data w.r.t selected features:

>>> qt.result_.collect()
    ID   X1        X2    X3 X4  X5        X6
0    0  0.2  0.000000   5.0  A   0  0.000000
1    1  1.0  0.153266  10.0  B   1  0.577889
2    2  2.3  0.153266  11.0  C   2  0.190955
3    3  0.5  0.153266  12.0  A   0  0.386199
4    4  1.2  0.307692  15.0  C   2  0.458912
5    5  3.0  0.462312  15.0  A   0  0.307188
6    6  2.5  0.462312  24.0  C   2  0.577889
7    7  1.2  0.462312  26.0  B   1  0.076395
8    8  1.1  0.768844  28.0  C   2  0.768966
9    9  0.1  0.768844  28.0  C   2  1.000000
10  10  0.7  0.768844  28.0  A   0  0.190955
11  11  0.9  0.768844  28.0  C   2  0.693143
12  12  0.2  0.768844  30.0  A   0  0.847317
13  13  0.0  1.000000  30.0  B   1  0.922065
Attributes:
result_DataFrame

Training data with selected features quantile-transformed.

model_list of DataFrames

The model for transforming subsequent data, consisted of 2 DataFrames:

  • DataFrame 1: Quantiles for the output distribution.

  • DataFrame 2: Other model info for the Quantile Transformer.

Methods

fit(data[, key, features, categorical_variable])

Quantile transformation to numerical features.

fit_transform(data[, key, features, ...])

Fit a Quantile Transformer, in the meantime transform the training data and return the result.

transform(data[, key])

Transform the test data using a fitted QuantileTransformer.

fit(data, key=None, features=None, categorical_variable=None)

Quantile transformation to numerical features.

Parameters:
dataDataFrame

Input data for fitting a quantile-transformation model(Quantile-Transformer).

keystr, optional

Specifies the name of the ID column in data.

Mandatory if data is not indexed by a single column; otherwise defaults to the index column of data.

featuresstr or list of strings, optional

Specifies the names of columns in data for which quantile-transformation should be applied. However, categorical columns in features are ignored since only numerical columns can be quantile-transformed.

Defaults to all numerical columns in data``(except ``key).

categorical_variablestr or list of strings, optional

Specifies the columns in data of type INTEGER that should be treated as categorical.

Defaults to None, i.e. all INTEGER columns in data are treated as continous features by default.

Returns:
A fitted Python object of class QuantileTransform.
fit_transform(data, key=None, features=None, categorical_variable=None)

Fit a Quantile Transformer, in the meantime transform the training data and return the result.

Parameters:
dataDataFrame

Input data for fitting a quantile-transformation model(Quantile-Transformer).

keystr, optional

Specifies the name of the ID column in data.

Mandatory if data is not indexed by a single column; otherwise defaults to the index column of data.

featuresstr or list of strings, optional

Specifies the names of columns in data for which quantile-transformation should be applied. However, categorical columns in features are ignored since only numerical columns can be quantile-transformed.

Defaults to all numerical columns in data``(except ``key).

categorical_variablestr or list of strings, optional

Specifies the columns in data of type INTEGER that should be treated as categorical.

Defaults to None, i.e. all INTEGER columns in data are treated as continous features by default.

Returns:
DataFrame

The data with selected features being quantile-transformed.

transform(data, key=None)

Transform the test data using a fitted QuantileTransformer.

Parameters:
dataDataFrame

Input data for applying a trained quantile-transformation model(Quantile-Transformer).

Should be structured the same as the data used in the model training phase.

keystr, optional

Specifies the name of the ID column in data.

Mandatory if data is not indexed by a single column; otherwise defaults to the index column of data.

Returns:
DataFrame

Quantile-transformed data w.r.t. selected(numerical) features.

Inherited Methods from PALBase

Besides those methods mentioned above, the QuantileTransform class also inherits methods from PALBase class, please refer to PAL Base for more details.