HANA R-Random Forest Regression

Properties that can be configured for the HANA R-Random Forest Regression algorithm.

Random Forest is a popular ensemble method that is used for classification and regression algorithms. The algorithm is performed by constructing a set of decision trees at training time. For a regression task, the mean prediction of individual trees is calculated as the output. Compared to other regression algorithms, this ensemble method leads to better accuracy and generalization on business datasets.

The R package that implements the algorithm is randomForest.

Note

The maximum level supported on each dataset feature is 53. The level refers to the category, variety or type of values that can be taken by a variable; for example, the column "Gender" has two levels, Male" and "Female". In this case the variable cannot have more than 53 types of values.

HANA R-Random Forest Regression Properties
Table 1: Algorithm Properties
Property Description
Features Select the input columns with which you want to perform the analysis.
Target Columns Select the target column on which you want to perform the analysis.
Number of Trees to Grow The amount of trees that are required to grow in the Random Forest. This parameter can be set between 5 and 1000 inclusive.
Minimum terminal nodes Minimum number of terminal nodes in the decision tree. This parameter can be set between 10 and 500 inclusive.