HANA R-CNR Tree

Properties that can be configured for the HANA R-CNR Tree algorithm.

Syntax Use this algorithm to classify observations into groups and predict one or more discrete variables based on other variables. However, you can also use this algorithm to find trends in data.
Note
  • The "rpart" package which is part of R 2.15 cannot handle column names with spaces or special characters. The "rpart" package supports only the input column name format that is supported by R dataframe.
  • Independent column names used while scoring the model should be same as independent column names used while creating the model.
  • Column names containing spaces or any other special character other than period (.) are not supported.
HANA R-CNR Tree Properties
Table 1: Algorithm Properties
Property Description
Output Mode Select the mode in which you want to use the output of this algorithm.
Possible values:
  • Trend: Predicts the values for the dependent column and adds an extra column in the output containing the predicted values.
  • Fill: Fills missing values in the target column.
Features Select the input columns with which you want to perform the analysis.
Target Variable Select the target column for which you want to perform the analysis.
Missing Values Select the method for handling missing values.
Possible values:
  • Ignore: The algorithm skips the records containing missing values in the independent column or the dependent column.
  • Keep: The algorithm retains the records containing missing values during calculation.
Algorithm Type Select the type of analysis you want the algorithm to perform.
Possible values:
  • Classification: Use this method - if the dependent variable has categorical values.
  • Regression: Use this method - if the dependent variable has numerical values.
Minimum Split Enter the minimum number of observations required for splitting a node. The default value is 10.
Split Criteria Select the splitting criteria of the node.
Possible values:
  • Gini: Gini impurity.
  • Information: Information gain.
Predicted Column Name Enter a name for the newly-created column that contains the predicted values.
Complexity Parameter Enter the complexity parameter that saves computing time by preventing any split that does not improve the fit. The default value is 0.005.
Maximum Depth Enter the maximum node level in the final tree with the root node counted as level 0.
Note If the maximum depth is greater than 30, the algorithm does not produce results as expected (on 32-bit machines).
Cross Validation Enter the number of cross validations. A higher cross validation value increases the computational time and produces more accurate results.
Prior Probability Enter the vector of prior probabilities.
Use Surrogate Select the surrogate to use in the splitting process.
Possible values:
  • Display Only - an observation with a missing value for the primary split rule is not sent further down the tree.
  • Use Surrogate - use this option to split subjects missing the primary variable; if all surrogates are missing, the observation is not split.
  • Stop if missing - if all surrogates are missing, sends the observation in the majority direction.
Surrogate Style Enter the style that controls the selection of the best surrogate.
Possible values:
  • Use total correct classification - algorithm uses total number of correct classifications to find a potential surrogate variable.
  • Use percent non missing cases - algorithm uses the percentage of non missing cases classified to find a potential surrogate.
Maximum Surrogate Enter the maximum number of surrogates to be retained at each node in a tree.
Show Probability Select the Show Probability check box to get the probability of predicted values during scoring of a classification model.