hanaml.RDTClassifier.Rdhanaml.RDTClassifier is a R wrapper for SAP HANA PAL Random Decision Trees for classification.
hanaml.RDTClassifier( data = NULL, key = NULL, features = NULL, label = NULL, formula = NULL, n.estimators = NULL, max.features = NULL, max.depth = NULL, min.samples.leaf = NULL, split.threshold = NULL, calculate.oob = NULL, random.state = NULL, thread.ratio = NULL, allow.missing.dependent = NULL, categorical.variable = NULL, sample.fraction = NULL, strata = NULL, priors = NULL, compression = NULL, max.bits = NULL, quantize.rate = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| label |
|
| formula |
|
| n.estimators |
|
| max.features |
|
| max.depth |
|
| min.samples.leaf |
|
| split.threshold |
|
| calculate.oob |
|
| random.state |
Defaults to 0. |
| thread.ratio |
|
| allow.missing.dependent |
Defaults to TRUE. |
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
| sample.fraction |
|
| strata |
|
| priors |
|
| compression |
|
| max.bits |
|
| quantize.rate |
|
Return a "RDTClassifier" object with following attributes:
model : DataFrame
Trained model content.
feature.importances : DataFrame
The feature importance (the higher, the more important the feature).
oob.error : DataFrame
Out-of-bag error rate or mean squared error for random decision trees up
to indexed tree.
Set to NULL if calculate.oob is FALSE.
confusion.matrix : DataFrame
Confusion matrix used to evaluate the performance of
classification algorithms.
Input DataFrame data:
> data$Collect()
OUTLOOK TEMP HUMIDITY WINDY CLASS
1 Sunny 75 70 Yes Play
2 Sunny 80 90 Yes Do not Play
3 Sunny 85 85 No Do not Play
4 Sunny 72 95 No Do not Play
5 Sunny 69 70 No Play
6 Overcast 72 90 Yes Play
7 Overcast 83 78 No Play
8 Overcast 64 65 Yes Play
9 Overcast 81 75 No Play
10 Rain 71 80 Yes Do not Play
11 Rain 65 70 Yes Do not Play
12 Rain 75 80 No Play
13 Rain 68 80 No Play
14 Rain 70 96 No Play
Call the function:
> rfc <- hanaml.RDTClassifier(data = data,
n.estimators=300,
max.features=3,
random.state=2,
split.threshold=0.00001,
calculate.oob=TRUE,
min.samples.leaf=1,
thread.ratio=1.0)
OR Giving features and labels as input to generating a model:
> rfc <- hanaml.RDTClassifier(data = data,
n.estimators=300,
max.features=3,
features=list("TEMP", "HUMIDITY", "WINDY"),
label="CLASS",
random.state=2,
split.threshold=0.00001,
calculate.oob=TRUE,
min.samples.leaf=1,
thread.ratio=1.0)
OR Giving input to model generation as a formula:
> rfc <- hanaml.RDTClassifier(data = data,
n.estimators=300,
max.features=3,
formula=CATEGORY~V1+V2+V3,
random.state=2,
split.threshold=0.00001,
calculate.oob=TRUE,
min.samples.leaf=1,
thread.ratio=1.0)
Output:
> rfc$feature.importances$Collect() VARIABLE_NAME IMPORTANCE 1 OUTLOOK 0.3475185 2 TEMP 0.2770724 3 HUMIDITY 0.2476346 4 WINDY 0.1277744