Datasets - Examples

Your dataset must have a specific structure so that it can be relevant for Smart Predict.

How a classification or regression predictive scenario training dataset must be structured?

Example of a training dataset used for a classification predictive scenario

In the example below, you want to predict if a customer will buy a product or not. You have prepared a training dataset containing the historical data on customers that have previously bought a similar product. In this dataset, the values of the target variable ("did the customer buy the product?") are known .

Once you have created your predictive model, you apply it to an application dataset. This dataset contains the same information about the customers you want to target with the new product. The target variable column ("Will_Buy?") is empty or even doesn't exist because this is what you are expecting to predict:

Smart Predict will use the predictive model to calculate the probability that each customer will buy the product. The column "Will_Buy?" is now filled in the generated dataset:

Example of a training dataset used for a regression predictive scenario

In the example below, you want to predict the number of complaints that your customer support will receive this week. You have prepared a training dataset containing historical values for several previous weeks. In this dataset, the values of your target variable ("how many complaints per week") are known:

You apply the regression predictive model to a new dataset which contains the same influencers. The values of the target variable for this week ("number of complaints this week") are unknown.

The predictive model makes a prediction of the number of complaints that can be expected this week.

How a time series predictive scenario training dataset must be structured?

Example of a dataset used for a time series predictive scenario (Non-segmented):

In the example below, you want to forecast the "product sales" for the next 3 months. You have prepared a dataset containing historical data of the previous sales statistics for the last 6 months.

Note

To give you a quick and clear overview of the structure, this example shows a 2:1 ratio, or 6 months of historical data for 3 months of forecast. When creating your own time series predictive scenarios with a month granularity, we recommend you use a 5:1 ratio, or 5 months of historical data for each month of forecast. This ratio ensures that the engine can detect enough cycles to create a forecast.

The training and the application are done at the same time: it consists of generating N forecasts in the future. In this example, we want to generate product sales forecasts for the next 3 months.

The generated dataset will look like the one displayed below:

You can include influencers in your data model to refine the forecasts. In fact, this is highly recommended for improving forecast accuracy. The values for these influencers are filled in for the next 3 months as well:

Example of a training dataset used for a time series predictive scenario with entities:

You can also forecast product sales for multiple products. You will need to add the product-related information to the dataset. In Smart Predict, you will specify that the variable "Product Name" will be used to split your predictive model into distinct entities:

In the generated dataset, the observations are divided into "Product Name" and you will get forecasts per product:

Of course, you can combine entities and influencers. In fact, this is recommended if you want to increase the accuracy of your time series models.