Sampling
- class hana_ml.algorithms.pal.preprocessing.Sampling(method, interval=None, sampling_size=None, random_state=None, percentage=None)
This class is used to choose a small portion of the records as representatives.
- Parameters:
- methodstr
Specifies the sampling method.
Valid options include: 'first_n', 'middle_n', 'last_n', 'every_nth', 'simple_random_with_replacement', 'simple_random_without_replacement', 'systematic', 'stratified_with_replacement', 'stratified_without_replacement'.
For the random methods, the system time is used for the seed.
- intervalint, optional
The interval between two samples.
Only required when
methodis 'every_nth'.If this parameter is not specified, the
sampling_sizeparameter will be used.- sampling_sizeint, optional
Number of the samples.
Default to 1.
- random_stateint, optional
Indicates the seed used to initialize the random number generator.
- It can be set to 0 or a positive value, where:
0: Uses the system time
Others: Uses the specified seed
Default to 0.
- percentagefloat, optional
Percentage of the samples.
Use this parameter when sampling_size is not set.
If both
sampling_sizeandpercentageare specified,percentagetakes precedence.Default to 0.1.
- Attributes:
- None
Methods
fit_transform(data[, features])Sampling the input dataset under specified configuration.
Examples
>>> smp = Sampling(method='every_nth', interval=5, sampling_size=8) >>> res = smp.fit_transform(data=df) >>> res.collect()
- fit_transform(data, features=None)
Sampling the input dataset under specified configuration.
- Parameters:
- dataDataFrame
Input DataFrame.
- featuresstr/ListofStrings, optional
The column that is used to do the stratified sampling.
Only required when method is 'stratified_with_replacement', or 'stratified_without_replacement'.
Defaults to None.
- Returns:
- DataFrame
Sampling results, same structure as defined in the input DataFrame.
Inherited Methods from PALBase
Besides those methods mentioned above, the Sampling class also inherits methods from PALBase class, please refer to PAL Base for more details.