Datasets - Examples
Your dataset must have a specific structure so that it can be relevant for Smart Predict.
Example of a training dataset used for a classification predictive scenario
In the example below, you want to predict if a customer will buy a product or not. You have prepared a training dataset containing the historical data on customers that have previously bought a similar product. In this dataset, the values of the target variable ("did the customer buy the product?") are known .
Once you have created your predictive model, you apply it to an application dataset. This dataset contains the same information about the customers you want to target with the new product. The target variable column ("Will_Buy?") is empty or even doesn't exist because this is what you are expecting to predict:
Smart Predict will use the predictive model to calculate the probability that each customer will buy the product. The column "Will_Buy?" is now filled in the generated dataset:
Example of a training dataset used for a regression predictive scenario
In the example below, you want to predict the number of complaints that your customer support will receive this week. You have prepared a training dataset containing historical values for several previous weeks. In this dataset, the values of your target variable ("how many complaints per week") are known:
You apply the regression predictive model to a new dataset which contains the same influencers. The values of the target variable for this week ("number of complaints this week") are unknown.
The predictive model makes a prediction of the number of complaints that can be expected this week.
Example of a dataset used for a time series predictive scenario (Non-segmented):
The training and the application are done at the same time: it consists of generating N forecasts in the future. In this example, we want to generate product sales forecasts for the next 3 months.
The generated dataset will look like the one displayed below:
You can include influencers in your data model to refine the forecasts. In fact, this is highly recommended for improving forecast accuracy. The values for these influencers are filled in for the next 3 months as well:
Example of a training dataset used for a time series predictive scenario with entities:
You can also forecast product sales for multiple products. You will need to add the product-related information to the dataset. In Smart Predict, you will specify that the variable "Product Name" will be used to split your predictive model into distinct entities:
In the generated dataset, the observations are divided into "Product Name" and you will get forecasts per product:
Of course, you can combine entities and influencers. In fact, this is recommended if you want to increase the accuracy of your time series models.