Datasets for Smart Predict
Dataset can be used as data source for Smart Predict. However, they must have a certain
structure and must contain some mandatory information depending on the type of predictive
scenario you are creating and where you are in the modeling process.
What are datasets?
A dataset is a collection of data that is usually presented in a table. Each row
represents an observation (which is the object of your interest), and each column
represents information corresponding to this observation. One of the columns
represents the target variable.
Depending on the nature of the data contained in the dataset, you will be able to
leverage it to create a certain type of predictive model for your specific need.
The graphic below summarizes which dataset is used depending on the step of the
predictive process:
Note
There are
sizing restrictions based on acquired dataset. Refer to the related link below
for more information.
Input Datasets
Input datasets are stored in the Files section of SAP
Analytics Cloud. You create an input dataset using the .
In SAP Analytics Cloud, you can use one of the following types of input datasets:
- Acquired: Data is imported (copied) and stored in SAP
Analytics Cloud. Acquired dataset have already been prepared on
your computer (supported formats are .TXT, .CSV and .XLSX ).
- Live: Data is stored in the source system. It isn't
copied to SAP Analytics Cloud, so any changes in the source
data are available immediately if no structural changes are brought to the
table or SQL view. You can connect to live data and create a live
dataset.
An input dataset is used to train the predictive model (training dataset) or is used to apply
the predictive model (application dataset).
Caution
The input datasets
used to train and apply a predictive model must come from the same data source
location. You can't apply a predictive model on a live dataset if it was trained
with an acquired dataset, nor can you apply a predictive model on an acquired
dataset if it was trained using a live one. However, you can have several
predictive models trained and applied with live and acquired datasets in the
same predictive scenario.
Note
While using live datasets, both live datasets (training and apply
datasets) must come from the same SAP HANA system: you cannot train a
predictive model with a live dataset with data from SAP HANA system 1
and then apply this predictive model on a live dataset with data coming
from SAP HANA system 2.
Restriction
For live datasets, any data changes you make to your tables
and SQL views in your SAP HANA on-premise system appear immediately in live
datasets. However, to update your predictive model, you need to do a retraining.
To create a predictive model, you must have a training dataset available that
contains actual data observed in the past.
Note
Time series predictive models may
also contain data for the future.
For example, if you have included
additional variables in our data model to refine the forecasts, the values
for these variables should be filled for the forecasted period as well.
Then, you apply the predictive model to an application dataset.
Note
In the case of a time series predictive model, you use the same dataset for
the train and apply step.
Restriction
For acquired datasets: When using acquired datasets, your
input dataset (training and application dataset) must not contain more than
1,000 columns. While applying the predictive model to an application dataset,
Smart Predict generates additional columns. The application
process can get blocked if your application dataset already contains many
columns and risks crossing the limit of 1,000 columns.
Generated Datasets
When you click the Apply button to get your predictions, a dataset
containing your predictions is generated. You can choose in which directory you want
to save your dataset. By default, they are saved in this folder: .
Note
When a dataset already exists with the same name as the dataset you are saving,
then the following rules apply:
- If both datasets have identical variables, the new dataset will automatically
replace the existing one.
- If the datasets are different, you receive an Apply Failed message. To continue, save your dataset under a different name.
The generated dataset contains the predictions and any additional columns you have
requested.
Note
You can then use this generated dataset to create a story or an SAP Analytics Cloud model.
However if you intend to get updates in your generated dataset, SAP recommends to
use it in a story: If you reapply your predictive model and erase the generated
dataset with an updated one, the story will be updated. For example, if you have
added rows to your apply dataset, the generated predictions for these new rows will
be added to the story. However, if you decide to use the generated dataset in an SAP
Analytics Cloud model, note that the SAP Analytics Cloud model won't be updated.