Dataset Prerequisites

What are the different types of dataset that you can use in Smart Predict?

In SAP Analytics Cloud you can create two types of datasets:

  • Acquired: Data is imported (copied) and stored in SAP Analytics Cloud. Changes made to the data in the source system don't affect the imported data.
  • Live: Data is stored in the source system. It isn't copied to SAP Analytics Cloud, so any changes in the source data are available immediately if no structural changes are brought to the table or SQL view.
What are the mandatory information that your input datasets must contain?

Depending on the type of predictive model that you are creating, you need to provide some essential information for the input dataset.

Classification and Regression Predictive Model

  • Your training dataset contains at least one column for the target variable, and multiple columns for other variables that you think may have an influence on the target variable. These variables are called Influencers.
  • The datasets used to train and apply a predictive model must come from the same type of data source (acquired or live). You can't apply a predictive model on a live dataset if it was trained with an acquired dataset, nor can you apply a predictive model on an acquired dataset if it was trained using a live one. However, you can have several predictive models trained and applied with live and acquired datasets in the same predictive scenario.
    Note
    While using live datasets, both live datasets (training and apply datasets) must come from the same SAP HANA system: you cannot train a predictive model with a live dataset with data from SAP HANA system 1 and then apply this predictive model on a live dataset with data coming from SAP HANA system 2.

Times Series Forecasting Predictive Model

  • Your training dataset contains a column for the signal variable. This contains the value that you want to forecast. The signal variable has to be continuous, with no missing values.
  • Your dataset contains a column for the date variable.
    Note
    The date formats must be:
    • YYYY-MM-DD
    • YYYY/MM/DD
    • YYYY/MM-DD
    • YYYY-MM/DD
    • YYYYMMDD
    • YYYY-MM-DD hh:mm:ss
    where YYYY stands for the year, MM stands for the month, DD stands for the day of the month, hh stands for hour, mm stands for minutes, and ss stands for seconds.
    Example
    January 25, 2018 will take one of the following supported formats:
    • 2018-01-25
    • 2018/01/25
    • 2018/01-25
    • 2018-01/25
    • 20180125
  • Forecast accuracy can be improved when the training dataset includes influencers. These are other variables that you think may have an influence on the signal variable. While values would normally be available for these over the observation period, you must ensure that values for influencers are also provided for the period you want to forecast. If values for influencers to cover the forecasted periods are not available, the predictive model won’t be successful, as you need observations to cover all of the requested forecast dates.

    For example, if you want to forecast chocolate sales for the year, you could add the specific dates of festive occasions to your dataset, for example, Easter, Christmas, Mother’s Day, or Valentine's day.
    Note

    The predictive forecasts will be influenced by the quality of the data you provided: both the historical data but also the future values for the influencers can impact your predictive model's accuracy. In some cases, it can be difficult to provide data with high quality for future values. For example: Future values for weather information can only be forecasts.