Input Datasets

Depending where you are in the predictive model lifecycle, your input dataset can be a training or an application dataset (in the case of a classification or regression predictive model) or both (in case of a time series predictive model as only one dataset is used).

Restriction

For live datasets, any data changes you make to your tables and SQL views in your SAP HANA on-premise system appear immediately in live datasets. However, to update your predictive model, you need to do a retraining.

Training Dataset

The training dataset contains the past observations that will be used to generate the predictive model. In this set, the values of the target variable, which is the variable corresponding to your business issue, are known. By analyzing the training dataset, Smart Predict generates a predictive model that explains and predicts the target variable, based on the variables identified as Influencers.

Application Dataset

You apply a predictive model on an application dataset (for classification and regression predictive models).

This dataset must contain the same information structure as the corresponding training dataset as follows:

The same number of variables (additional columns will be ignored),
The same variable names as the corresponding training dataset.

Note

Once the predictive model is applied, you will find the predicted values of the target in the generated output dataset.

Restriction

You need to take into account some restrictions for input datasets to ensure the training and application of your predictive models uses the available resources effectively:

Your training or application input dataset must not contain more than 1,000 columns. While applying the predictive model to an application dataset, Smart Predict generates additional columns. The application process can get blocked if your application dataset already risks crossing the limit of 1,000 columns. For more information refer to System Sizing, Tuning, and Limits.
The following limits are recommended when using a segmented time series forecast model on an input training or application dataset:
- Number of forecasts (independent of the number of segments): 120 maximum
- Number of segments: 1000 maximum
If your predictive model is configured for a number of forecasts or segments beyond the recommended maximum limits, then it's use of resources is likely to create performance issues that can impact other users on the same tenant.
Your training and application input datasets must also come from the same type of data source. You can't apply a predictive model on a live dataset if it was trained with an acquired dataset, nor can you apply a predictive model on an acquired dataset if it was trained using a live one.

Note

Empty values in your dataset remain empty and they appear in the Blank Count column in the Dataset Preview.