hanaml.ImputeTS.Rd
This is an R wrapper for SAP PAL procedure PAL_IMPUTE_TIME_SERIES.
hanaml.ImputeTS(
data = NULL,
key = NULL,
categorical.variable = NULL,
imputation.type = NULL,
base.algorithm = NULL,
col.imputation.type = NULL,
alpha = NULL,
extrapolation = NULL,
smooth.width = NULL,
auxiliary.normalitytest = NULL,
thread.ratio = NULL
)
DataFrame
Specifies the input time-series data for missing value handling.
str
Specifies the column name in data
that represents the order of time-series.
character or list/vector of characters, optional
Indicates features should be treated as categorical variable.
The default behavior is dependent on what input is given:
"VARCHAR" and "NVARCHAR": categorical
"INTEGER" and "DOUBLE": continuous.
VALID only for variables of "INTEGER" type, omitted otherwise.
No default value.
str, optional
Specifies the overall imputation type(i.e. strategy) for all columns in data
(exclusive of the key
column).
"non" Does nothing. Leave all columns untouched.
"most_frequent.allzero": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values by zero.
"most_frequent.mean": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing its mean.
"most_frequent.median": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values by median.
"most_frequent.sma": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via simple moving average method.
"most_frequent.lma": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via linear moving average method.
"most_fequent.ema": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values by exponential moving average method.
"most_frequent.linterp": For any categorical column, fill all missing values by the value that linear interpolation.
"most_frequent.sinterp": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via spline interpolation.
"most_frequent.seadec": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via seasonal decompose.
"most_frequent.locf": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via last observation carried forward.
"most_frequent.nocb": For any categorical column, fill all missing values by the value that appears most often in that column; while for any numerical column, fill all missing values via next observation carried back.
The preface "most_frequent" can be omitted for simplicity. For example, "most_frequent.linterp" can be
simply replaced by "linterp" when inputting the imputation type.
Defaults to "most_fequent.mean".
list, optional
Specifies the column-wise imputation type that overwrites the overall imputation type.
Should be a named list such that the name each element corresponds to a column name in data
,
while the element value corresponds to a valid column imputation type.
Valid column imputation types include:
"allzero" : Fill all missing values by zero.
"mean" : Fill all missing values by the mean of the column.
"median" : Fill all missing values by the median of the column.
"sma" : Fill all missing values via simple moving average method.
"lma" : Fill all missing values via linear moving average method.
"ema" : Fill all missing values via exponential moving average method.
"linterp" : Fill all missing values via linear interpolation.
"sinterp" : Fill all missing values via spline interpolation.
"locf" : Fill all missing values via last observation carried forward.
"nocb" : Fill all missing values via next observation carried backward.
Among the above listed imputation types, "non" applies to both numerical and categorical columns, most_frequent' applies to categorical columns only, while the rest apply to numerical columns only. If the input goes beyond the above list of options, it will be treated as a constant value for the universal replacement of all missing values in that column.
numeric, optional
Specifies the criterion for the autocorrelation coefficient.
Valid values ranging from 0 to 1.
A larger value indicates stricter requirement for seasonality.
Defaults to 0.2
logical, optional
Specifies whether or not to extrapolate the endpoints of the time-series data.
Defaults to FALSE.
integer, optional
Specifies the width of the moving average applied to non-seasonal data,
where 0 indicates linear fitting to extract trends.
Effective only to data columns that are to be imputed via seasonal decompose.
logical, optional
Specifies whether or not to use normality test to identify model types or not.
Defaults to FALSE.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
str, optional
Specifies the base imputation algorithm for seasonal decompose.
Applicable only to numerical data columns that are to be imputed by seasonal decompose.
Valid options include:
"allzero" : Fill all missing values by zero.
"mean" : Fill all missing values by the mean of the column.
"median" : Fill all missing values by the median of the column.
"sma" : Fill all missing values via simple moving average method.
"lma" : Fill all missing values via linear moving average method.
"ema" : Fill all missing values via exponential moving average method.
"linterp" : Fill all missing values via linear interpolation.
"sinterp" : Fill all missing values via spline interpolation.
"locf" : Fill all missing values via last observation carried forward.
"nocb" : Fill all missing values via next observation carried backward.
An "ImputeTS" object with the following attributes:
result : DataFrame
The same column structure (number of columns, column names, and column
types) with the table with which the model is trained.
model : DataFrame
statistics/model content.
Input time-series data for imputation:
> data$Collect()
ID V X
1 0 0.1 A
2 1 0.3 A
3 2 NA A
4 3 0.7 <NA>
5 4 0.9 B
6 5 1.1 B
Setting up a proper imputation strategy to fill in all missing values:
> imp <- hanaml.ImputeTS(data, key = 'ID', imputation_type='most_frequent.linterp')
> imp$result$Collect()
ID V X
1 0 0.1 A
2 1 0.3 A
3 2 0.5 A
4 3 0.7 A
5 4 0.9 B
6 5 1.1 B