After importing raw data into a new or existing model, or into a story, you may need to perform some data preparation.
For detailed background information about data preparation, see About Preparing Data.
Importing the data
Typically, when you import draft data into a new or existing model, the data acquisition will happen in the background while you work on something else.
Load the raw data following the steps in the help topic specific to your workflow and data source:
Viewing the data imported to create a new modelDuring the initial data import, the data type of each column is inferred, as well as dimension type in the data integration view. In the right hand Details side panel, you will have a summary of the characteristics of the dataset.
- Under Layout, select the Card View icon to display all the columns as
separate cards. Each card includes the column name, a specific summary of the
data corresponding to its type (mean value for numbers, date ranges for dates,
number of unique values for text, etc.), as well as a quality bar that shows in
green the proportion of values formatted correctly – those in red formatted
incorrectly and require action, and those in gray are empty.
You can sort the displayed cards in original, ascending, or descending order, or according to the model structure (listing all measures first, followed by dimensions). If you switch to (Grid View), the columns associated with your selected cards are automatically highlighted.
- The Details panel is divided into two sections:
- Model level: shows general information about the imported data, including any data-quality issues including mapping issues, and model-specific information. A Model Options section is provided to enable planning, set the default currency for the model, plus other options.
- Column level: allows you to map columns to existing dimensions in the model, displays histograms for the currently selected column (rectangles indicating frequency of data items in successive numerical intervals), lists the attribute and dimension types that you can apply to the column, and displays any data-quality issues represented as quality bars under Data Quality. The green section indicates correct values, while those in red indicate incorrect values. Those in gray are empty.
- When you select a column in (Grid
View) or Card
View, numerical or textual histograms are displayed under
- Text histograms are horizontal, and the values are clustered by count. The number of clusters can be adjusted by dragging the slider shown above the histogram. When a cluster contains more than one value, the displayed count is the average count for the cluster. The count is prefaced by a tilde symbol (~) if there are multiple different occurrences. Expand the cluster for a more detailed view of the values in the cluster along with individual counts. Use the search tool to look up specific column values, and press Enter to initiate the search. When you select a value in the histogram, the corresponding column is sorted and the value is highlighted in the grid.
- Numerical histograms are vertical, and represent the range of values along the x-axis. Hover over any bar to show the count, minimum, and maximum values for the data in the bar. The number of bars can also be adjusted by using the slider above the histogram. Toggling the Show Outliers box includes or removes outliers from the histogram.
- You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. In the toolbar, select (Level Based Hierarchy). For more information, see Creating a Level-Based Hierarchy.
- Select any of the following Model Options:
- Specify whether you want the model to be Planning Enabled. For information about model types, see Planning Models and Analytics Models.
- Specify whether you want empty ID cells to be filled with automatically generated values. If you clear this check box, rows with empty ID cells will be omitted.
Viewing and mapping data imported to an existing model
The Card View displays after you import data into an existing model. By default, cards representing dimensions and measures are displayed. Double-click a card to display attribute cards under the selected card.
Cards display as either mapped or unmapped entities. A card represents mapped data if it is shaded solid and has defined borders.
A card represents unmapped data if it does not have a border and appears transparent.
A red dot in the top right corner of the card indicates there are mapping errors that need to be addressed. A blue dot indicates that new values have been added from the import to the existing data.
|Attribute only available for organization dimension|
|Description or property attributes (for example, unit, scale, or decimal)|
When you select a card, the Details panel is populated with information specific to your selection – the number of rows, unique values, data types, and corresponding histograms. A pie chart is provided under Mapping Quality showing the proportion of Accepted versus Rejected values.
- Original Order: reflects the original order in the imported data source.
- A-Z: alphabetical order based on column headings.
- Z-A: reverse alphabetical order based on column headings.
- Mapped First: mapped columns listed alphabetically before unmapped columns.
- Unmapped First: unmapped columns listed alphabetically before mapped columns.
The following headings appear under color-coded progress bars indicating what percentage of the imported data was accepted: Dimensions, Attributes, and Measures. If a progress bar is not 100 percent blue, select the heading for more information on the mapping progress and a listing of unmapped entities.
Mapping imported data into an existing model:
- Determine which unmapped imported columns can be associated with specific transparent borderless cards in the card view.
- Drag and drop an unmapped imported column on the associated card.
- Map the remaining imported columns.
- To map an unmapped location dimension, select the corresponding card and from Map LocationsGeo by Area Name if your locations are based on area data, or the to display either the Map Latitude/Longitude dialog if your locations are based on coordinates. For coordinates, use the Latitude and Longitude fields to select corresponding columns. If your locations are based on area, select a column corresponding to Region and another for Sub-Region 1. choose
- Under Mapping Options select View all
options to choose any of the following options:
- Update dimension with new values: Select this
option if you want to add and update dimension members:
- New members in the dimension are added to the model.
- Changes to the description or properties of existing members are updated in the model.
- Changes to a hierarchy that has been previously defined are updated in the model.
- Convert value symbol by account type: Choose how accounting data is handled in this import. Special data types for accounting data are available. These automatically apply negative signs to values in the database as necessary, although the data shown on screen is shown as a positive value (this applies to the types INC and LEQ – refer to the Account dimension type for more information). When this option is selected, the value symbol is matched to the account type. For example, expenses are negative values and revenues are positive values. See Attributes of an Account Dimension and Preparing Data to display either the for details.
- Fill applicable empty ID cells with the "#" value: When creating a new model, if member IDs are empty, they are automatically filled with the “#” value if you select this option. Otherwise, those rows are omitted from the model.
- Update dimension with new values: Select this option if you want to add and update dimension members:
- Select the Import Method to determine how existing data
A choice of three import methods is available.
Update Updates the existing data and adds new entries to the target model. The scope of this update is based on a combination of all dimensions in the target model. For a more refined scope, use either the Clean and replace selected version data or Clean and replace subset of data update options. Append Keeps the existing data as is and adds new entries to the target model. Clean and replace selected version data Deletes the existing data and adds new entries to the target model, only for the versions that you specify in the import. You can choose to use either the existing version or specify a new version under Version. If you specify to import data for the "actual" version, only the data in the "actual" version is cleaned and replaced. Other versions, for example "planning", are not affected. Clean and replace subset of data
Replaces existing data and adds new entries to the target model for a defined subset of the data based on a scope of selected versions using either the Existing Version or New Version buttons. You can also limit the scope to specific dimensions. To define a scope based on a combination of dimensions, select + Add Scope and use the Select a dimension field to specify a dimension.
When a Date dimension is defined in the scope, the time range in the source data (calculated based on the target model's granularity) combined with other dimensions in the scope will determine what existing data is cleaned and replaced.If for example, Date and Region dimensions are defined as part of a scope, only entries that fall within the time range and match Region from the source data will be replaced in the target model. Existing data that does not match the scope will be kept as is.
- Choose which Category you want to use for the imported data.
- You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. In the toolbar, select (Level Based Hierarchy). For more information see Creating a Level-Based Hierarchy.
Combining data with your acquired data
- Under Transformations, select the
(Combine Data) icon.
The Let's add some data! dialog is displayed.
- Choose how you want to add data:
- Import a file.
- Acquire data from a data source.
Once the new data is loaded, the Combine Data dialog is displayed. Under Combine Settings, a table representing all columns in the original data is displayed on the left, and a table listing columns from the new dataset is on the right.
- Choose the most appropriate columns in each table to combine data. Click to select, or drag columns to the field provided under Combine Column for each table.
- Select how you want to combine the columns:
- All primary data: maintains the non-intersecting data from the original dataset.
- Intersecting data only: omits any data from the original dataset that does not match in the new data.
The results for the column combination in terms of Accepted, Duplicated, Nulls, and Omitted values are provided under Combine Preview. A sample of combined data rows is displayed under Data Sample.
- Select Combine when you are ready to combine the two tables.
Importing cross table data
- Once the acquired data (or sample) is loaded in the data integration view,
select the (Unpivot) icon from the
The Create an Unpivot Transform panel appears in Details.
- Select under Unpivot Rows the rows you want to use to create new columns. By default column headers become members of the new column.
- Under Unpivot Columns select columns you want to include in the new table format. Selecting Select All will include all the listed columns.
- Set Preview on grid on to preview the new format.
- Select Unpivot to implement the Unpivot transform and to actualize the new format.
Improving data quality
- If you are creating a new model, in the Model level: review the model name, change it if desired, and enter an optional description.
- To edit column names, double-click the column headers, and provide the new name.
- To define your dimensions when creating a new model, select a column in the
Column view, and expand
Modeling to specify any of the following dimension
types for the column:
If version data is present in your source file, select the column that contains the version. Set the Modeling type to Version to display an additional Version Mapping feature available in the Column tab. Select the Map Versions button to open an additional dialog where you can map a dimension category (such as Actuals or Forecast) to the version of the data being imported.
- When creating a new model you can select Add Dimension Attributes to define other columns as attributes for the selected dimension.
- To create a location dimension column for a new model, you need to specify what
type of location dimension you want to include - location by region or location
- Select the (Geo Enrichment) icon under Transformations, and choose either Location by Region or Location by Coordinates. Select columns to associate with longitude/latitude or region/sub-region. A location dimension column is created. When you use the Location by Coordinates option you do not need to select a column to serve as location identifier. As you prepare the data, if you modify any of the coordinate data used to create the location dimension, the location dimension values will automatically reflect your modifications.
- For measures, you can add a unit label: for example, cartons, kilograms, etc.
- You should resolve any items that are listed in the Details panel under either the Data Quality or Model Requirements when creating a new model, or Mapping Quality if you are adding new data to an existing model. For more information, see the "Data Quality Considerations" section in About Preparing Data.
- Select a message to see the options available to resolve any identified quality
issue. Use the context sensitive editing features and the transform bar to edit
data in a cell, column, or card. When selecting a column, a cell, or content in
a cell, a menu will appear with options for performing transforms. There are two
parts to this menu:
- Choose the Quick Actions option to perform actions such as duplicate column, trim whitespace, hide columns, or delete columns or rows. The table below lists all the available actions.
- Select the (Smart Transformations) icon to list suggested transformations to apply to the column, such as replacing the value in a cell with a suggested value. You can also select Create a Transform and choose from the options listed under the transformation bar displayed above the columns. The transformation bar is used to edit and create transforms.
As you hover over a suggested transformation, the anticipated results are previewed in the grid. To apply the transformation, simply select the transform. You can manually enter your own transformation in the transformation bar; as the transform is built, a preview is provided in the affected column, cell, or content within the cell. The table below summarizes the available transformations you can apply to selected columns, cells, or content within cells.
Transformation Description Transform Bar Format Delete Rows Delete rows in the data, either by selecting individual members, or by specifying a range (not possible for text columns; only numerical and Date columns). Provided only as a quick action Trim Whitespace Remove spaces, including non-printing characters, from the start and end of strings. Provided only as a quick action Duplicate Column Create a copy of an existing column. Provided only as a quick action Delete Column Delete a column. Use the Shift key to select and then delete multiple columns. Provided only as a quick action Concatenate Combine two or more columns into one. An optional value can be entered to separate the column values. Concatenate [<Column1>], [<Column2>]… using "value" Split Split a text column on a chosen delimiter, starting from left to right. The number of splits can be chosen by the user. Split [<Column>] on "delimiter" repeat "#" Extract Use this transform to extract a block of text specified as numbers, words, or targeted values within a column to a new column. Extract [<what to extract]>] [<where to extract>] [<which occurrence>] [""] from [<column name>] [<include value option> ].
Options for what to extract:
Options for where to extract relative to the target value:
- number Limits extracted text to one number from column cell.
- word Limits extracted text to a word from column cell.
- everything Includes all text.
Options for specifying occurrence:
- containing: Extracts a word or number containing the target. For example, if your target is ship, both ship shipping will be extracted.
- equal to: Extracts the specific target value if it exists in the cell.
- occurrence: Allows you to specify the position of the target from one to ten.
To extract the first number from all column cells, you would specify the following:Extract before first "" from [<column name>].
To extract all text between parenthesis in column cells, you would specify:Extract everything between "(" and ")" from [<column name>].
Change Change a column to uppercase, lowercase, or title case. Change [<Column>] to (<UPPERCASE>/<lowercase>/<TitleCase>) Replace Replaces either an entire cell or content that could be found in multiple different cells. Replace (<cell/content>) in [<Column>] matching "value" with "value"
With a <where> clause: Replace (<cell/content>) in [<Column>] matching "value" with "value" where[<Column>]is "value"
- A history of all transforms you implement is displayed in Transform Log. There are two levels of transformation logs recorded: transforms on the entire dataset, and transformations on the currently selected column. Hover over a specific transform in the Transform Logto highlight the impacted column. You can roll back the change by either deleting the entries in the history or using the (Undo/Redo) buttons on the toolbar. You can remove transforms in the Transform Log panel out of sequential order provided that there are no dependencies.
Completing data preparation
Once you've finished preparing your data and resolved all data or model quality issues, you're ready to create the new model or finish mapping the data you're adding to the model or story.
- a. If you're working with a dataset sample, choose Validate Data to apply your transforms across the entire dataset and check for data quality errors.
- b. Choose either Create Model to import the entire dataset and create your new model, or if you're adding data to a model or story, choose Finish Mapping.