Dataset Structure

To be used in SAP Analytics Cloud, your dataset must contain columns (corresponding to the variables) and rows (corresponding to the observations):

When the goal is to create a classification or regression predictive scenario, the corresponding training dataset must contain at least one column for the target variable and multiple columns for influencer variables.

Whereas, if you want to create a time series predictive model, the training dataset should contain:
  • A column with the date variable (one date per period of time). The date formats must be one amongst:
    • YYYY-MM-DD
    • YYYY/MM/DD
    • YYYY/MM-DD
    • YYYY-MM/DD
    • YYYYMMDD
    • YYYY-MM-DD hh:mm:ss
    where YYYY stands for the year, MM stands for the month, DD stands for the day of the month, hh stands for the hour, mm stands for the minutes and ss stands for the seconds.
    Example
    January 25, 2018 will take one of the following supported formats:
    • 2018-01-25
    • 2018/01/25
    • 2018/01-25
    • 2018-01/25
    • 20180125
  • A column with the value that you want to forecast, being the signal variable.

  • While this is not mandatory, we highly recommend also including influencer variables as part of the training dataset for the past dates and for the forecasted period, in order to improve the forecast accuracy.
If your goal is to create a segmented time series predictive model, make sure that one column includes the segment information.
Example
For example, say that you work for an energy supplier, and you want to estimate the energy consumption over the next 24 months, by dwelling sector of a given district. Your dataset should then contain at least three columns, including the following data:
  • The consumption (signal variable) for the past date.

  • The date (date variable), including one date per month over several years.
  • The consumption by sector (segmented by).
  • Eventually influencer variables (for example, various calendar events) that happen in the past and in the next 24 months.

To summarize your dataset must contain the right data in the right format, depending on the type of predictive scenario.