hanaml.RandomForestClassifier {hana.ml.r} | R Documentation |
hanaml.RandomForestClassifier is a R wrapper for PAL Random Decision Trees.
hanaml.RandomForestClassifier(conn.context, data = NULL, formula = NULL, features = NULL, label = NULL, key = NULL, n.estimators = NULL, max.features = NULL, max.depth = NULL, min.samples.leaf = NULL, split.threshold = NULL, calculate.oob = TRUE, random.state = NULL, thread.ratio = NULL, allow.missing.dependent = TRUE, categorical.variable = NULL, sample.fraction = NULL, strata = NULL, priors = NULL)
conn.context |
|
data |
|
key |
|
features |
|
label |
|
formula |
|
n.estimators |
Defaults to '100'. |
max.features |
Defaults to 'sqrt(p)' (for classification) or 'p/3' (for regression), where p is the number of input features. |
max.depth |
By default it is unlimited. |
min.samples.leaf |
|
split.threshold |
Defaults to 1e-5. |
calculate.oob |
|
random.state |
|
thread.ratio |
Defaults to -1. |
allow.missing.dependent |
|
categorical.variable |
|
sample.fraction |
|
strata |
|
priors |
|
R6Class
object.
Return a "RandomForestClassifier" object with following attributes:
model : DataFrame
Trained model content.
feature.importance : DataFrame
The feature importance (the higher, the more important the feature).
oob.error : DataFrame
Out-of-bag error rate or mean squared error for random forest up
to indexed tree.
Set to None if calculate.oob is FALSE.
confusion.matrix : DataFrame
Confusion matrix used to evaluate the performance of
classification algorithms.
Using Summary and Print
Summary provides a general summary of the output of the model. Usage: summary(rfc) where rfc is the model generated
Print provides information on the coefficients and the optional parameter values given by the user. Usage: print(rfc) where rfc is the model generated.
predict.RandomForestClassifier
## Not run: Input DataFrame df for training: > df$Collect() OUTLOOK TEMP HUMIDITY WINDY CLASS 1 Sunny 75 70 Yes Play 2 Sunny 80 90 Yes Do not Play 3 Sunny 85 85 No Do not Play 4 Sunny 72 95 No Do not Play 5 Sunny 69 70 No Play 6 Overcast 72 90 Yes Play 7 Overcast 83 78 No Play 8 Overcast 64 65 Yes Play 9 Overcast 81 75 No Play 10 Rain 71 80 Yes Do not Play 11 Rain 65 70 Yes Do not Play 12 Rain 75 80 No Play 13 Rain 68 80 No Play 14 Rain 70 96 No Play Creating RandomForestClassifier instance: rfc <- hanaml.RandomForestClassifier(conn.context = conn, data = df, n.estimators=300, max.features=3, random.state=2, split.threshold=0.00001, calculate.oob=TRUE, min.samples.leaf=1, thread.ratio=1.0) Giving features and labels as input to generating a model: rfc <- hanaml.RandomForestClassifier(conn.context = conn, data = df, key = NULL, n.estimators=300, max.features=3, features = list('TEMP', 'HUMIDITY', 'WINDY'), label = "CLASS", random.state=2, split.threshold=0.00001, calculate.oob=TRUE, min.samples.leaf=1, thread.ratio=1.0) Giving input to model generation as a formula: rfc <- hanaml.RandomForestClassifier(conn.context = conn, data = df, n.estimators=300, max.features=3, formula=CATEGORY~V1+V2+V3, random.state=2, split.threshold=0.00001, calculate.oob=TRUE, min.samples.leaf=1, thread.ratio=1.0) > rfc$feature.importances$Collect() VARIABLE_NAME IMPORTANCE 1 OUTLOOK 0.3475185 2 TEMP 0.2770724 3 HUMIDITY 0.2476346 4 WINDY 0.1277744 Input DataFrame for scoring: > df3$Collect() ID OUTLOOK TEMP HUMIDITY WINDY CLASS 1 0 Sunny 75 70 Yes Play 2 1 Sunny NA 90 Yes Do not Play 3 2 Sunny 85 NA No Do not Play 4 3 Sunny 72 95 No Do not Play 5 4 <NA> NA 70 <NA> Play 6 5 Overcast 72 90 Yes Play 7 6 Overcast 83 78 No Play 8 7 Overcast 64 65 Yes Do not Play 9 8 Overcast 81 75 No Play Performing score() on given DataFrame: > dtc$score(df3) 0.8932412 ## End(Not run)