hanaml.Imputer {hana.ml.r} | R Documentation |
Missing value imputation for DataFrame.
hanaml.Imputer(conn.context, data = NULL, key = NULL, strategy = NULL, strategy.by.col = NULL, als.factors = NULL, als.lambda = NULL, als.maxit = NULL, als.randomstate = NULL, als.exit.threshold = NULL, als.exit.interval = NULL, als.linsolver = NULL, als.cg.maxit = NULL, als.centering = NULL, als.scaling = NULL, categorical.variable = NULL, thread.ratio = NULL)
conn.context |
|
data |
|
key |
|
strategy |
The overall imputation strategy. Choices are mostly for numerical columns. For categorical columns, if mssing values are not left outouched or deleted, then they will be replaced by the most frequent values of their columns by default.
Defaults to 'mean'. |
strategy.by.col |
|
als.factors |
Defaults to 3. |
als.lambda |
Defaults to 0.01. |
als.maxit |
|
als.randomstate |
Specifies the seed of the random number generator used in the training of
ALS model:
Defaults to 0. |
als.exit.threshold |
|
als.exit.interval |
|
als.linsolver |
|
als.cg.maxit |
|
als.centering |
|
als.scaling |
|
categorical.variable |
|
thread.ratio |
|
R6Class
object.
An "Imputer" object with the following attributes:
result : DataFrame
The same column structure (number of columns, column names, and column
types) with the table with which the model is trained.
model : DataFrame
statistics/model content.
The parameters having pre-fix 'als' are invoked only when als' is the overall imputation strategy. Those parameters are for setting up the alternating-least-square(ALS) mdoel for data imputation.
## Not run: Input DataFrame data for training: > data$Collect() V0 V1 V2 V3 V4 V5 1 10 0 D NA 1.4 23.6 2 20 1 A 0.4 1.3 21.8 3 50 1 C NULL 1.6 21.9 4 30 NULL B 0.8 1.7 22.6 5 10 0 A 0.2 NULL NULL 6 10 0 <NULL> 0.5 1.8 19.7 7 NULL 0 C 0.5 NULL 17.8 8 10 1 A 0.6 1.6 24.9 9 20 NULL D 0.9 1.7 22.2 10 30 1 D 0.4 1.3 NULL 11 50 0 <NULL> 0.3 1.2 16.4 12 NULL 1 B 0.7 1.2 19.3 13 30 1 A 0.2 1.1 21.7 14 30 0 D NULL NULL NULL 15 NULL 1 C 0.5 1.8 18.6 16 20 0 A 0.6 1.4 17.9 Model training and a "imputer" object is returned: > imputer <- hanaml.Imputer(conn, data, strategy = "mean", categorical.variable = "V1", strategy.by.col = c(V1 = 0)) Expected output: > imputer$result$Collect() V0 V1 V2 V3 V4 V5 1 10 0 D 0.5076923076923077 1.4 23.6 2 20 1 A 0.4 1.3 21.8 3 50 1 C 0.5076923076923077 1.6 21.9 4 30 0 B 0.8 1.7 22.6 5 10 0 A 0.2 1.4692307692307693 20.646153846153844 6 10 0 A 0.5 1.8 19.7 7 24 0 C 0.5 1.4692307692307693 17.8 8 10 1 A 0.6 1.6 24.9 9 20 0 D 0.9 1.7 22.2 10 30 1 D 0.4 1.3 20.646153846153844 11 50 0 A 0.3 1.2 16.4 12 24 1 B 0.7 1.2 19.3 13 30 1 A 0.2 1.1 21.7 14 30 0 D 0.5076923076923077 1.4692307692307693 20.646153846153844 15 24 1 C 0.5 1.8 18.6 16 20 0 A 0.6 1.4 17.9 ## End(Not run)