R: Decision Tree Model for Regression

hanaml.DecisionTreeRegressor {hana.ml.r}

R Documentation

Decision Tree Model for Regression

Description

hanaml.DecisionTreeRegressor is a R wrapper for PAL Decision tree.

Usage

hanaml.DecisionTreeRegressor (conn.context, data = NULL,
                             key = NULL, features = NULL,
                             label = NULL,formula = NULL,
                             thread.ratio = NULL,
                             allow.missing.dependent = NULL,
                             percentage = NULL,
                             min.records.of.parent = NULL,
                             min.records.of.leaf = NULL, max.depth = NULL,
                             categorical.variable = NULL,
                             split.threshold = NULL,
                             use.surrogate = NULL, model.format = NULL,
                             discretization.type = NULL,
                             bins = NULL, max.branch = NULL,
                             merge.threshold = NULL,
                             output.rules = NULL
                             )

Arguments

`conn.context`	`ConnectionContext` The connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character, optional` Name of the ID column of data. If not provided, then data is assumed to have no ID column.
`features`	`list of character, optional` Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.
`label`	`character, optional` Name of the column in data that specifies the dependent variable. Defaults to the last non-ID column if not provided.
`formula`	`formula type, optional` Formula to be used for model generation. format = label~<feature_list> eg: formula=CATEGORY~V1+V2+V3 You can either give the formula, or a feature and label combination. Do not provide both. Defaults to NULL.
`thread.ratio`	`double, optional` Controls the proportion of available threads that can be used. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use up to that percentage of available threads. Other values are heuristically determined. Defaults to -1.
`allow.missing.dependent`	`logical, optional` Specifies if a missing target value is allowed. FALSE does not allow the missing target value. An error occurs if a missing target is present. TRUE allows the missing target value. The datum with the missing target is removed. #' Defaults to TRUE.
`percentage`	`double, optional` Specifies the percentage of the input data that will be used to build the tree model. The rest of the data will be used for pruning. Defaults to 1.0.
`min.records.of.parent`	`integer, optional` Specifies the stop condition. If the number of records in one node is less than the specified value, the algorithm stops splitting. Defaults to 2.
`min.records.of.leaf`	`integer, optional` Promises the minimum number of records in a leaf. Defaults to 1.
`max.depth`	`integer, optional` The maximum depth of a tree. By default it is unlimited.
`categorical.variable`	`list of characters, optional` Indicates features should be treated as categorical. The behavior is dependent on what input is given. 'string': categorical 'integer' and 'double': continuous. VALID only for integer variables; omitted otherwise. The default value is detected from input data.
`split.threshold`	`double, optional` Specifies the stop condition for a node. CART: The reduction of Gini index or relative MSE of the best split is less than this value. The smaller the SPLIT_THRESHOLD value is, the larger a CART tree grows. Defaults to 1e-5 for CART.
`use.surrogate`	`logical, optional` Indicates whether to use a surrogate split when NULL values are encountered. FALSE does not use surrogate split. TRUE uses a surrogate split. Only valid for CART. Defaults to TRUE.
`model.format`	`character, optional` Specifies the tree model format for store. Valid options are json and pmml. Defaults to 'json'.
`discretization.type`	`character, optional` Specifies the strategy for discretizing continuous attributes. Valid options are mdlpc and equal_freq. Valid only for C45 and CHAID. Defaults to 'mdlpc'.
`bins`	`list` Specifies the number of bins for discretization in list. Each element in the list must be named, with names being column names, and values being the number of bins for discretization. Only valid when discretizaition type is "equal_freq". Defaults to 10 for each column.
`max.branch`	`integer, optional` Specifies the maximum number of branches. Defaults to 10.
`merge.threshold`	`double, optional` Specifies the merge condition for CHAID: if the metric value is greater than or equal to the specified value, the algorithm will merge two branches.
`output.rules`	`logical, optional` Specifies whether to output decision rules or not. FALSE does not output decision rules. TRUE outputs decision rules.#' Defaults to TRUE.

Format

R6Class object.

Value

A "DecisionTreeRegressor" object with the following attributes:

model : DataFrame Trained model content.

decision.rules : DataFrame Rules for decision tree to make decisions.

confusion.matrix : DataFrame Confusion matrix used to evaluate the performance of classification algorithms.

Note

Using Summary and Print

Summary provides a general summary of the output of the model. Usage: summary(dtr) where dtr is the model generated

Print provides information on the coefficients and the optional parameter values given by the user. Usage: print(dtr) where dtr is the model generated.

Examples

## Not run: 
Input DataFrame for training:

> head(data$Collect(),5)
OUTLOOK TEMP HUMIDITY WINDY CLASS
1   Sunny   75       70   Yes     1
2   Sunny   80       90   Yes     0
3   Sunny   85       85    No     0
4   Sunny   72       95    No     0
5   Sunny   69       70    No     1

Creating DecisionTreeRegressor model:

>dtr = hanaml.DecisionTreeRegressor( conn,
                             features = list("A", "B", "C"),label = "LABEL",key = 'ID',
                             min.records.of.parent = 2, min.records.of.leaf = 1,
                             thread.ratio = 0.4, split.threshold = 1e-5,
                             model.format = 'pmml',  output.rules = TRUE )
Giving input to fit as a formula:

>dtr = hanaml.DecisionTreeRegressor( conn,
                                formula=LABEL~A+B+C,,key = NULL,
                                min.records.of.parent = 2, min.records.of.leaf = 1,
                                thread.ratio = 0.4, split.threshold = 1e-5,
                                 model.format = 'pmml',  output.rules = TRUE


## End(Not run)

[Package hana.ml.r version 1.0.8 Index]