hanaml.DecisionTreeRegressor.Rd
hanaml.DecisionTreeRegressor is a R wrapper for SAP HANA PAL Decision tree.
hanaml.DecisionTreeRegressor(
data = NULL,
key = NULL,
features = NULL,
label = NULL,
formula = NULL,
thread.ratio = NULL,
allow.missing.dependent = NULL,
percentage = NULL,
min.records.of.parent = NULL,
min.records.of.leaf = NULL,
max.depth = NULL,
categorical.variable = NULL,
split.threshold = NULL,
use.surrogate = NULL,
model.format = NULL,
output.rules = NULL,
evaluation.metric = NULL,
parameter.range = NULL,
parameter.values = NULL,
resampling.method = NULL,
repeat.times = NULL,
fold.num = NULL,
param.search.strategy = NULL,
random.search.times = NULL,
timeout = NULL,
progress.indicator.id = NULL
)
DataFrame
DataFrame containting the data.
character, optional
Name of the ID column.
If not provided, the data is assumed to have no ID column.
No default value.
character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.
character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.
formula type, optional
Formula to be used for model generation.
format = label~<feature_list>
e.g.: formula=CATEGORY~V1+V2+V3
You can either give the formula,
or a feature and label combination, but do not provide both.
Defaults to NULL.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads. Values between 0 and 1 will use up to
that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads
used is then be heuristically determined.
Defaults to -1.
logical, optional
Specifies if a missing target value is allowed.
FALSE does not allow the missing target value.
An error occurs if a missing target is present.
TRUE allows the missing target value.
The datum with the missing target is removed.
#'
Defaults to TRUE.
double, optional
Specifies the percentage of the input data that will be used to build the tree model.
The rest of the data will be used for pruning.
Defaults to 1.0.
integer, optional
Specifies the stop condition. If the number of records in one node is less
than the specified value, the algorithm stops splitting.
Defaults to 2.
integer, optional
Promises the minimum number of records in a leaf.
Defaults to 1.
integer, optional
The maximum depth of a tree.
By default it is unlimited.
character or list/vector of characters, optional
Indicates features should be treated as categorical variable.
The default behavior is dependent on what input is given:
"VARCHAR" and "NVARCHAR": categorical
"INTEGER" and "DOUBLE": continuous.
VALID only for variables of "INTEGER" type, omitted otherwise.
No default value.
double, optional
Specifies the stop condition for a node. In this case, it is
the reduction of Gini index or relative MSE of the best split is
less than this value in 'cart' algorithm.
The smaller the value is, the larger a "cart" tree grows.
Defaults to 1e-5.
logical, optional
Indicates whether to use a surrogate split when NULL values are encountered.
FALSE does not use surrogate split.
TRUE uses a surrogate split.
Defaults to TRUE.
character, optional
Specifies the tree model format for store.
Valid options are json and pmml.
Defaults to "json".
logical, optional
Specifies whether to output decision rules or not.
FALSE does not output decision rules.
TRUE outputs decision rules.
Defaults to TRUE.
c("rmse", "mae"), optional
Specifies the evaluation metric for model evaluation or parameter selection.
Defaults to "rmse".
list, optional
Specifies range of the following parameters for parameter selection:min.records.of.leaf, min.records.of.parent, max.depth, split.threshold
.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(split.threshold = c(1e-5, 2e-5, 1e-4)).
If param.search.strategy
is "random", then the step has no effect
and thus can be omitted.
list, optional
Specifies values of the following parameters for parameter selection:min.records.of.leaf, min.records.of.parent, max.depth, split.threshold
.
character, optional
specifies the resampling method for model evaluation or parameter selection.
Valid options include: "cv", "bootstrap".
If no value is specified for this parameter, neither model evaluation
nor parameter selection is activated.
numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.
integer, optional
Specifies the fold number for the cross-validation(cv).
Mandatory and valid only when resampling.method
is specified as "cv".
c("grid", "random"), optional
Specifies the method to activate parameter selection.
If not specified, model selection shall not be triggered.
integer, optional
Specifies the number of times to randomly select candidate parameters for selection.
Mandatory and valid only when param.search.strategy
is "random".
integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds.
No timeout when 0 is specified.
character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.
An R6 object of class "DecisionTreeRegressor", with the following attributes and public methods:
Attributes
model: DataFrame
Trained model content.
decision.rules: DataFrame
Rules for decision tree to make decisions.
Methods
CreateModelState(model=NULL, algorithm=NULL, func=NULL, state.description="ModelState", force=FALSE)
Usage:
> dtr <- hanaml.DecisionTreeRegressor(data=df)
> dtr$CreateModelState()
Arguments:
model: DataFrame
DataFrame containing the model for parsing. Defaults to self$model
.
algorithm: character
Specifies the PAL algorithm associated with model
. Defaults to self$pal.algorithm
.
func: character
Specifies the functionality for Unified Classification/Regression.
Defaults to self$func
.
state.description: character
A summary string for the generated model state.
Defaults to "ModelState".
force: logic
Specifies whether or not the replace existing state for model
.
Defaults to FALSE.
After calling this method, an attribute state
that contains the parsed info for model
shall be assigned
to the corresponding R6 object.
DeleteModelState(state=NULL)
Usage:
Assuming we have trained a hanaml
model and created its model state, like the following:
> dtr <- hanaml.DecisionTreeRegressor(data=df)
> dtr$CreateModelState()
After using the model state for real-time scoring, we can delete the state by calling
> dtr$DelateModelState()
Arguments:
state: DataFrame
DataFrame containing the state info. Defaults to self$state
.
After calling this method, the specified model state shall be cleaned up and associated memory be released.
Input DataFrame data:
> head(data$Collect(),5)
OUTLOOK TEMP HUMIDITY WINDY CLASS
1 Sunny 75 70 Yes 1
2 Sunny 80 90 Yes 0
3 Sunny 85 85 No 0
4 Sunny 72 95 No 0
5 Sunny 69 70 No 1
Call the function:
> dtr <- hanaml.DecisionTreeRegressor(data,
features = list("A", "B", "C"),
label = "LABEL",
key = "ID",
min.records.of.parent = 2,
min.records.of.leaf = 1,
thread.ratio = 0.4,
split.threshold = 1e-5,
model.format = "pmml",
output.rules = TRUE )
OR call the function with formula:
> dtr <- hanaml.DecisionTreeRegressor(data,
formula=LABEL~A+B+C,
key = NULL,
min.records.of.parent = 2,
min.records.of.leaf = 1,
thread.ratio = 0.4,
split.threshold = 1e-5,
model.format = "pmml",
output.rules = TRUE)