HANA R-Bagging Classification

Properties that can be configured for the HANA R-Bagging Classification algorithm.

Overview:
The Bagging algorithm, also known as “Bootstrap aggregating”, is a popular ensemble method that can be applied for classification tasks. The algorithm creates random subsets of the original dataset and performs classification on each subset. The predicted values from the classifier are aggregated to form the final prediction. This ensemble method is designed to improve the accuracy and robustness of single classification algorithm on business datasets.

The R packages that implement the algorithm are adabag and rpart.

Note

In the component, the decision tree method is selected as the classification algorithm.

Note

When the column names contain the hyphen symbol (-), use the Data Type component to re-define the column name.

HANA R-Bagging Classification Properties
Table 1: Algorithm Properties
Property Description
Maximum Depth Enter the maximum node level in the final tree with the root node counted as level 0. This parameter can be set between 1 and 20 inclusive.
Minimum Split Enter the minimum number of observations required for splitting a node. The default value is 0. The parameter can be set between 0 and 500 inclusive.
Complexity Parameter Enter the complexity parameter, which saves computing time by preventing any split that does not improve the fit. The value for the parameter must be between [-1, 1), which is equal to or more than -1 and less than 1.
Number of Trees to Use Number of trees used in the forest of a decision tree algorithm. The decision tree algorithm is used for bagging. The parameter can be set between 5 and 500 inclusive.
Features Select the input columns with which you want to perform the analysis.
Target Columns Select the target column on which you want to perform the analysis.