| hanaml.RandomForestRegressor {hana.ml.r} | R Documentation |
hanaml.RandomForestRegressor is a R wrapper for PAL Random Decision Trees.
hanaml.RandomForestRegressor(conn.context, data = NULL,
formula = NULL,
features = NULL,
label = NULL, key = NULL,
n.estimators = NULL,
max.features = NULL,
max.depth = NULL,
min.samples.leaf = NULL,
split.threshold = NULL,
calculate.oob = TRUE,
random.state = NULL,
thread.ratio = NULL,
allow.missing.dependent = TRUE,
categorical.variable = NULL,
sample.fraction = NULL)
conn.context |
|
data |
|
formula |
|
key |
|
features |
|
label |
|
n.estimators |
|
max.features |
|
max.depth |
|
min.samples.leaf |
|
split.threshold |
|
calculate.oob |
|
random.state |
|
thread.ratio |
|
allow.missing.dependent |
Defaults to TRUE. |
categorical.variable |
|
sample.fraction |
|
R6Class object.
Return a "RandomForestClassifier" object with following values:
model : DataFrame
Trained model content.
feature.importance : DataFrame
The feature importance (the higher, the more important the feature).
oob.error : DataFrame
Out-of-bag error rate or mean squared error for random forest up
to indexed tree.
Set to None if calculate_oob is FALSE.
Using Summary and Print
Summary provides a general summary of the output of the model. Usage: summary(rfr) where rfr is the model generated
Print provides information on the coefficients and the optional parameter values given by the user. Usage: print(rfr) where rfr is the model generated.
## Not run:
Input DataFrame df for training:
>df$Collect()
ID A B C D CLASS
0 0 -0.965679 1.142985 -0.019274 -1.598807 -23.633813
1 1 2.249528 1.459918 0.153440 -0.526423 212.532559
2 2 -0.631494 1.484386 -0.335236 0.354313 26.342585
3 3 -0.967266 1.131867 -0.684957 -1.397419 -62.563666
4 4 -1.175179 -0.253179 -0.775074 0.996815 -115.534935
......
Creating RandomForestRegressor instance and generating model:
> rfr <- hanaml.RandomForestRegressor(conn.context=cc, data = df, random.state=3)
> rfr$feature.importances$Collect()
VARIABLE_NAME IMPORTANCE
0 A 0.249593
1 B 0.381879
2 C 0.291403
3 D 0.077125
Input DataFrame for scoring:
> head(df3$Collect(),5)
ID A B C D CLASS
0 0 1.081277 0.204114 1.220580 -0.750665 139.10170
1 1 0.524813 -0.012192 -0.418597 2.946886 52.17203
2 2 -0.280871 0.100554 -0.343715 -0.118843 -34.69829
3 3 -0.113992 -0.045573 0.957154 0.090350 51.93602
4 4 0.287476 1.266895 0.466325 -0.432323 106.63425
..
Performing score() on given DataFrame:
> rfr$score(data = df3, features = list("A","B", "C","D")
0.8490768
## End(Not run)