You can configure properties for the Partition component in HANA and non-HANA scenarios.
The Partition component partitions an input dataset randomly into three subsets called Train, Test, and Validate. The proportion of each subset is defined as a parameter. The union of three subsets need not be the complete initial dataset.
In the second case, the dataset needs to have at least one categorical attribute (for example, of type varchar). The initial dataset is subdivided according to the different categorical values of this attribute. Each mutually exclusive subset is then randomly split to obtain the Train, Test, and Validate subsets. This ensures that all "categorical values" or "strata" are present in the sampled subset.
Note that when comparing two or more algorithms in the model comparison chain, the Partition component is mandatory.
| Property | Description |
|---|---|
| Partition Method | Select the method for partitioning data into train, test, and validation sets.
|
| Random Seed | Enter a random number using which you want to perform the calculation. |
| Partition Rows by | Select the method for partitioning rows.
|
| Train Set | Enter the number of rows or percentage of rows for the train set. |
| Test Set | Enter the number of rows or percentage of rows for the test set. |
| Validation Set | Enter the number of rows or percentage of rows for validation set. |
| Partition Column Name | Enter a name for the new column that contains partitioned values. |
| Number of Threads | Enter the number of threads the algorithm should use for execution. |