HANA R-Random Forest Classification

Properties that can be configured for the HANA R-Random Forest Classification algorithm.

Overview:
Random Forest is a popular ensemble method that is used for classification and regression algorithms. The algorithm is performed by constructing a set of decision trees at training time. For a classification task, the output class is based on the majority vote from an individual decision tree in the forest. Compared to other classification algorithms, this ensemble method leads to better accuracy and generalization on business datasets.

The R package that implements the algorithm is randomForest.

Note

The maximum level supported on each dataset feature is 53.

HANA R-Random Forest Classification Properties
Table 1: Algorithm Properties
Property Description
Features Select the input columns with which you want to perform the analysis.
Target Columns Select the target column on which you want to perform the analysis.
Number of Trees to Grow The amount of trees that are required to grow in the Random Forest. This parameter can be set between 5 and 1000 inclusive.
Minimum terminal nodes Minimum number of terminal nodes in the decision tree. This parameter can be set between 10 and 500 inclusive.