Partitioning Data

The Partition component splits datasets into Train, Validate and Test partitions. It also provides flexibility by enabling you to configure the percentage of data required for each partition.

The best way to build predictive analytics models is to build the models on training (or Train) data. This way, you can tune the parameters of the algorithms while evaluating the performance of the model using the validation dataset.

The models are fitted to the training data. The tuning of model parameters is based on the performance of the model on the Validate dataset.

After the model parameters are optimized for best performance, the test data is used to pick the model that has the best performance on a completely unseen dataset called the Test dataset.

The Partition component is use on all algorithms in Expert Analytics, including PAL, APL and R in both agnostic and HANA models.

You work with the Partition component in the Predict room. Double-click the Partition component under the Data Preparation list of components on the right-hand panel. The preprocessor component is added to the analysis editor and an automatic connection is created to the data source component. From the contextual menu of the preprocessor component and choose Configure Properties. In the component properties dialog box, enter the necessary percentages for the Train, Test and Validate datasets and click Done. To view the results you click (Run Analysis).