hanaml.DecisionTreeRegressor {hana.ml.r} | R Documentation |
Decision Tree Model for Regression
Description
hanaml.DecisionTreeRegressor is a R wrapper for PAL Decision tree.
Usage
hanaml.DecisionTreeRegressor (conn.context, data = NULL,
key = NULL, features = NULL,
label = NULL,formula = NULL,
thread.ratio = NULL,
allow.missing.dependent = NULL,
percentage = NULL,
min.records.of.parent = NULL,
min.records.of.leaf = NULL, max.depth = NULL,
categorical.variable = NULL,
split.threshold = NULL,
use.surrogate = NULL, model.format = NULL,
discretization.type = NULL,
bins = NULL, max.branch = NULL,
merge.threshold = NULL,
output.rules = NULL
)
Arguments
conn.context |
ConnectionContext
The connection to the SAP HANA system.
|
data |
DataFrame
DataFrame containing the data.
|
key |
character, optional
Name of the ID column of data.
If not provided, then data is assumed to have no ID column.
|
features |
list of character, optional
Names of the feature columns.
If features is not provided, it defaults
to all non-ID, no-label columns.
|
label |
character, optional
Name of the column in data
that specifies the dependent variable.
Defaults to the last non-ID column if not provided.
|
formula |
formula type, optional
Formula to be used for model generation.
format = label~<feature_list>
eg: formula=CATEGORY~V1+V2+V3
You can either give the formula,
or a feature and label combination.
Do not provide both.
Defaults to NULL.
|
thread.ratio |
double, optional
Controls the proportion of available threads that can be used.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates up to all available threads.
Values between 0 and 1 will use up to that
percentage of available threads.
Other values are heuristically determined.
Defaults to -1.
|
allow.missing.dependent |
logical, optional
Specifies if a missing target value is allowed.
FALSE does not allow the missing target value.
An error occurs if a missing target is present.
TRUE allows the missing target value.
The datum with the missing target is removed. #'
Defaults to TRUE.
|
percentage |
double, optional
Specifies the percentage of the input data that will be used to build the tree model.
The rest of the data will be used for pruning.
Defaults to 1.0.
|
min.records.of.parent |
integer, optional
Specifies the stop condition. If the number of records in one node is less
than the specified value, the algorithm stops splitting.
Defaults to 2.
|
min.records.of.leaf |
integer, optional
Promises the minimum number of records in a leaf.
Defaults to 1.
|
max.depth |
integer, optional
The maximum depth of a tree.
By default it is unlimited.
|
categorical.variable |
list of characters, optional
Indicates features should be treated as categorical.
The behavior is dependent on what input is given.
'string': categorical
'integer' and 'double': continuous.
VALID only for integer variables; omitted otherwise.
The default value is detected from input data.
|
split.threshold |
double, optional
Specifies the stop condition for a node.
CART: The reduction of Gini index or relative MSE of the best split is
less than this value.
The smaller the SPLIT_THRESHOLD value is, the larger a CART tree grows.
Defaults to 1e-5 for CART.
|
use.surrogate |
logical, optional
Indicates whether to use a surrogate split when NULL values are encountered.
FALSE does not use surrogate split.
TRUE uses a surrogate split.
Only valid for CART.
Defaults to TRUE.
|
model.format |
character, optional
Specifies the tree model format for store.
Valid options are json and pmml.
Defaults to 'json'.
|
discretization.type |
character, optional
Specifies the strategy for discretizing continuous attributes.
Valid options are mdlpc and equal_freq.
Valid only for C45 and CHAID.
Defaults to 'mdlpc'.
|
bins |
list
Specifies the number of bins for discretization in list.
Each element in the list must be named, with names being column names,
and values being the number of bins for discretization.
Only valid when discretizaition type is "equal_freq".
Defaults to 10 for each column.
|
max.branch |
integer, optional
Specifies the maximum number of branches.
Defaults to 10.
|
merge.threshold |
double, optional
Specifies the merge condition for CHAID:
if the metric value is greater than or
equal to the specified value, the algorithm will merge two branches.
|
output.rules |
logical, optional
Specifies whether to output decision rules or not.
FALSE does not output decision rules.
TRUE outputs decision rules.#'
Defaults to TRUE.
|
Format
R6Class
object.
Value
A "DecisionTreeRegressor" object with the following attributes:
model : DataFrame
Trained model content.
decision.rules : DataFrame
Rules for decision tree to make decisions.
confusion.matrix : DataFrame
Confusion matrix used to evaluate the performance of classification algorithms.
Note
Using Summary and Print
Summary provides a general summary of the output of the model.
Usage: summary(dtr) where dtr is the model generated
Print provides information on the coefficients and the optional parameter
values given by the user.
Usage: print(dtr) where dtr is the model generated.
Examples
## Not run:
Input DataFrame for training:
> head(data$Collect(),5)
OUTLOOK TEMP HUMIDITY WINDY CLASS
1 Sunny 75 70 Yes 1
2 Sunny 80 90 Yes 0
3 Sunny 85 85 No 0
4 Sunny 72 95 No 0
5 Sunny 69 70 No 1
Creating DecisionTreeRegressor model:
>dtr = hanaml.DecisionTreeRegressor( conn,
features = list("A", "B", "C"),label = "LABEL",key = 'ID',
min.records.of.parent = 2, min.records.of.leaf = 1,
thread.ratio = 0.4, split.threshold = 1e-5,
model.format = 'pmml', output.rules = TRUE )
Giving input to fit as a formula:
>dtr = hanaml.DecisionTreeRegressor( conn,
formula=LABEL~A+B+C,,key = NULL,
min.records.of.parent = 2, min.records.of.leaf = 1,
thread.ratio = 0.4, split.threshold = 1e-5,
model.format = 'pmml', output.rules = TRUE
## End(Not run)
[Package
hana.ml.r version 1.0.8
Index]