| hanaml.DecisionTreeClassifier {hana.ml.r} | R Documentation |
hanaml.DecisionTreeClassifier is a R wrapper for PAL Decision tree.
hanaml.DecisionTreeClassifier (conn.context, algorithm,
data = NULL,
key = NULL,
features = NULL,
label = NULL,
formula = NULL,
thread.ratio = NULL,
allow.missing.dependent = NULL, percentage = NULL,
min.records.of.parent = NULL,
min.records.of.leaf = NULL, max.depth = NULL,
categorical.variable = NULL,
split.threshold = NULL, use.surrogate = NULL,
model.format = NULL,
discretization.type = NULL,
bins = NULL, max.branch = NULL,
merge.threshold = NULL,
priors = NULL, output.rules = NULL,
output.confusion.matrix = NULL)
conn.context |
|
algorithm |
|
data |
|
key |
|
features |
|
label |
|
formula |
|
thread.ratio |
|
allow.missing.dependent |
|
percentage |
Defaults to 1.0. |
min.records.of.parent |
Defaults to 2. |
min.records.of.leaf |
|
max.depth |
|
categorical.variable |
|
split.threshold |
The smaller the SPLIT_THRESHOLD value is, the larger
a C45 or CART tree grows.
On the contrary, CHAID will grow a larger tree with a
larger SPLIT_THRESHOLD value. Defaults to 1e-5 for C45 and CART, and 0.05 for CHAID. |
discretization.type |
Defaults to 'mdlpc'. |
bins |
Defaults to '10' for each column. |
max.branch |
Defaults to '10'. |
merge.threshold |
Defaults to '0.05'. |
use.surrogate |
Defaults to 'TRUE'. |
model.format |
Defaults to 'json'. |
output.rules |
Defaults to TRUE. |
priors |
|
output.confusion.matrix |
Defaults to TRUE. |
R6Class object.
A "DecisionTreeClassifier" object with the following attributes:
model: DataFrame
Trained model content.
decision.rules: DataFrame
Rules for decision tree to make decisions.
confusion.matrix: DataFrame
Confusion matrix used to evaluate the performance of
classification algorithms.
Using Summary and Print
Summary provides a general summary of the output of the model.
Usage: summary(dtc) where dtc is the model generated
Print provides information on the coefficients and the optional
parameter values given by the user.
Usage: print(dtc) where dtc is the model generated.
## Not run:
Input DataFrame for training:
> data$Collect()
OUTLOOK TEMP HUMIDITY WINDY CLASS
1 Sunny 75 70 Yes Play
2 Sunny 80 90 Yes Do not Play
3 Sunny 85 85 No Do not Play
4 Sunny 72 95 No Do not Play
5 Sunny 69 70 No Play
6 Overcast 72 90 Yes Play
7 Overcast 83 78 No Play
8 Overcast 64 65 Yes Play
9 Overcast 81 75 No Play
0 Rain 71 80 Yes Do not Play
1 Rain 65 70 Yes Do not Play
2 Rain 75 80 No Play
3 Rain 68 80 No Play
4 Rain 70 96 No Play
Creating DecisionTreeClassifier model:
dtc = hanaml.DecisionTreeClassifier( conn, algorithm = 'c45', data = data,
features = list('TEMP', 'HUMIDITY', 'WINDY'),
label = "CLASS", key= NULL
min.records.of.parent = 2, min.records.of.leaf = 1,
thread.ratio = 0.4, split.threshold = 1e-5,
model.format = 'json', output.rules = TRUE )
Giving input to create a model as a formula:
dtc = hanaml.DecisionTreeClassifier( conn, algorithm = 'c45', data = data,
formula=CATEGORY~V1+V2+V3, key= "ID"
min.records.of.parent = 2, min.records.of.leaf = 1,
thread.ratio = 0.4, split.threshold = 1e-5,
model.format = 'json', output.rules = TRUE )
> dtc$decision.rules$Collect()
ROW_INDEX RULES_CONTENT
0 0 (TEMP>=84) => Do not Play
1 1 (TEMP<84) && (OUTLOOK=Overcast) => Play
2 2 (TEMP<84) && (OUTLOOK=Sunny) && (HUMIDITY<82.5) => Play
3 3 (TEMP<84) && (OUTLOOK=Sunny) && (HUMIDITY>=82.5) => Do not Play
4 4 (TEMP<84) && (OUTLOOK=Rain) && (WINDY=Yes) => Do not Play
5 5 (TEMP<84) && (OUTLOOK=Rain) && (WINDY=No) => Play
Input DataFrame for predicting:
> data2$Collect()
ID OUTLOOK HUMIDITY TEMP WINDY
0 0 Overcast 75.0 70 Yes
1 1 Rain 78.0 70 Yes
2 2 Sunny 66.0 70 Yes
3 3 Sunny 69.0 70 Yes
4 4 Rain NaN 70 Yes
5 5 None 70.0 70 Yes
6 6 *** 70.0 70 Yes
Performing predict() on given DataFrame:
> result = predict(dtc,data2, verbose=FALSE)
ID SCORE CONFIDENCE
0 0 Play 1.000000
1 1 Do not Play 1.000000
2 2 Play 1.000000
3 3 Play 1.000000
4 4 Do not Play 1.000000
5 5 Play 0.692308
6 6 Play 0.692308
here:
dtc is the model generated
data2 is the DataFrame to predict from.
Input DataFrame for scoring:
DecisionTreeRegressor data3$Collect()
ID OUTLOOK TEMP HUMIDITY WINDY CLASS
0 Sunny 75 70 Yes Play
1 Sunny NA 90 Yes Do not Play
2 Sunny 85 NA No Do not Play
3 Sunny 72 95 No Do not Play
4 <NA> NA 70 <NA> Play
5 Overcast 72 90 Yes Play
6 Overcast 83 78 No Play
7 Overcast 64 65 Yes Do not Play
8 Overcast 81 75 No Play
Performing score() on given DataFrame:
> dtc$score(data3)
0.75
## End(Not run)