ImputeTS¶
- hana_ml.algorithms.pal.preprocessing.ImputeTS(imputation_type=None, base_algorithm=None, alpha=None, extrapolation=None, smooth_width=None, auxiliary_normalitytest=None, thread_ratio=None)¶
Imputation of multi-dimensional time-series data. This is the Python wrapper for PAL procedure PAL_IMPUTE_TIME_SERIES.
- Parameters
- imputation_typestr, optional
Specifies the overall imputation type for all columns of the time-series data. Valid options include:
'non' : Does nothing. Leave all columns untouched.
'most_frequent-allzero' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values by zero.
'most_frequent-mean' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing its mean.
'most_frequent-median' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values by median.
'most_frequent-sma' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via simple moving average method.
'most_frequent-lma' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via linear moving average method.
'most_frequent-ema' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values by exponential moving average method.
'most_frequent-linterp' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via linear interpolation.
'most_frequent-sinterp' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via spline interpolation.
'most_frequent-seadec' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via seasonal decompose.
'most_frequent-locf' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via last observation carried forward.
'most_frequent-nocb' : For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via *next observation carried back.
The preface 'most_frequent' can be omitted for simplicity.
Defaults to 'most_frequent-mean'.
- base_algorithmstr, optional
Specifies the base imputation algorithm for seasonal decompose. Applicable only to numerical data columns that are to be imputed by seasonal decompose. Valid options include:
'allzero' : Fill all missing values by zero.
'mean' : Fill all missing values by the mean of the column.
'median' : Fill all missing values by the median of the column.
'sma' : Fill all missing values via simple moving average method.
'lma' : Fill all missing values via linear moving average method.
'ema' : Fill all missing values via exponential moving average method.
'linterp' : Fill all missing values via linear interpolation.
'sinterp' : Fill all missing values via spline interpolation.
'locf' : Fill all missing values via last observation carried forward.
'nocb' : Fill all missing values via next observation carried backward.
Defaults to 'mean'.
- alphafloat, optional
Specifies the criterion for the autocorrelation coefficient. The value range is (0, 1). A larger value indicates stricter requirement for seasonality.
Defaults to 0.2.
- extrapolationbool, optional
Specifies whether or not to extrapolate the endpoints of the time-series data.
Defaults to False.
- smooth_widthint, optional
Specifies the width of the moving average applied to non-seasonal data, where 0 indicates linear fitting to extract trends.
Effective only to data columns that are to be imputed via seasonal decompose.
- auxiliary_normalitytestbool, optional
Specifies whether to use normality test to identify model types or not.
Defaults False.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 1.
- Attributes
- model_DataFrame
A column-wise time-series imputation model stored in statistics format, i.e. with stat names and stat values.
- result_DataFrame
The imputation result, structured the same as the data used for obtaining the time-series imputation model, with all missing valued filled.
Examples
>>> imp = ImputeTS(imputation_type='most_frequent-linterp') >>> res = imp.fit_transform(data=df, key='ID') >>> res.collect()