Preparing Data
After importing raw data into a new or existing model, or into a story, you may need to perform some data preparation.
For detailed background information about data preparation, see About Preparing Data.
Typically, when you import draft data into a new or existing model, the data acquisition will happen in the background while you work on something else. keep the above xref near the top of this topic, because some readers will arrive at this topic looking for explanatory info about wrangling, which the linked-to topic contains, and this current topic is so long that readers might miss the related link at the bottom.
Load the raw data following the steps in the help topic specific to your workflow and data source:
- Under Layout, select the Card Viewkeep icon to display all the columns as
separate cards. Each card includes the column name, a specific summary of the
data corresponding to its type (mean value for numbers, date ranges for dates,
number of unique values for text, etc.), as well as a quality bar that shows in
green the proportion of values formatted correctly – those in red formatted
incorrectly and require action, and those in gray are empty.
You can sort the displayed cards in original, ascending, or descending order, or according to the model structure (listing all measures first, followed by dimensions). If you switch to (Grid View), the columns associated with your selected cards are automatically highlighted.
NoteUse the (Show/Hide) icon to limit the view to a few selected columns in either (Grid View) or Card View. A drop-down list of all the columns in the uploaded data is displayed under (Show/Hide). All columns listed with a check mark are displayed in the current view. Select a column name to show or hide the column. You can use the search tool to look for a specific column. You can also hide columns by directly selecting columns in the data integration view, and choosing . - The Details panel is divided into two sections:
- Model level: shows general information about the imported data, including any data-quality issues including mapping issues, and model-specific information. A Model Options section is provided to enable planning, set the default currency for the model, plus other options.
- Column level: allows you to map columns to existing dimensions in the model, displays histograms for the currently selected column (rectangles indicating frequency of data items in successive numerical intervals), lists the attribute and dimension types that you can apply to the column, and displays any data-quality issues represented as quality bars under Data Quality. The green section indicates correct values, while those in red indicate incorrect values. Those in gray are empty.
- When you select a column in (Grid
View) or Card
View, numerical or textual histograms are displayed under
Data Distribution.
- Text histograms are horizontal, and the values are clustered by count.
The number of clusters can be adjusted by dragging the slider shown
above the histogram. When a cluster contains more than one value, the
displayed count is the average count for the cluster. The count is
prefaced by a tilde symbol (~) if there are multiple different
occurrences. Expand the cluster for a more detailed view of the values
in the cluster along with individual counts. Use the search tool to look up specific column values, and press
Enter to initiate the search. When you select
a value in the histogram, the corresponding column is sorted and the
value is highlighted in the grid.NoteThe displayed histogram is determined by the data type of the column. A column with numbers could still be considered as text if the data type is set to text.
- Numerical histograms are vertical, and represent the range of values
along the x-axis. Hover over any bar to show the count, minimum, and
maximum values for the data in the bar. The number of bars can also be
adjusted by using the slider above the histogram. Toggling the
Show Outliers box includes or removes
outliers from the histogram.NoteBelow the histogram, there is a box and whisker plot to visualize the histogram's distribution of values.
- Text histograms are horizontal, and the values are clustered by count.
The number of clusters can be adjusted by dragging the slider shown
above the histogram. When a cluster contains more than one value, the
displayed count is the average count for the cluster. The count is
prefaced by a tilde symbol (~) if there are multiple different
occurrences. Expand the cluster for a more detailed view of the values
in the cluster along with individual counts. Use the search tool to look up specific column values, and press
Enter to initiate the search. When you select
a value in the histogram, the corresponding column is sorted and the
value is highlighted in the grid.
- You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. In the toolbar, select (Level Based Hierarchy). For more information, see Creating a Level-Based Hierarchy.
- Select any of the following Model Options:
- Specify whether you want the model to be Planning Enabled. For information about model types, see Planning Models and Analytic Models.
- Specify whether you want empty ID cells to be filled with automatically generated values. If you clear this check box, rows with empty ID cells will be omitted.
The Card View displays after you import data into an existing model. By default, cards representing dimensions and measures are displayed. Double-click a card to display attribute cards under the selected card.
Cards display as either mapped or unmapped entities. A card represents mapped data if it is shaded solid and has defined borders.
A card represents unmapped data if it does not have a border and appears transparent.
A red dot in the top right corner of the card indicates there are mapping errors that need to be addressed. A blue dot indicates that new values have been added from the import to the existing data.
Icon |
Data Type |
---|---|
Generic dimension | |
Date dimension | |
Account dimension | |
Measure | |
Version dimension | |
Organization dimension | |
Attribute only available for organization dimension | |
Description or property attributes (for example, unit, scale, or decimal) | |
Timestamp dimension | |
Hierarchy attribute |
When you select a card, the Details panel is populated with information specific to your selection – the number of rows, unique values, data types, and corresponding histograms. A pie chart is provided under Mapping Quality showing the proportion of Accepted versus Rejected values.
- Original Order: reflects the original order in the imported data source.
- A-Z: alphabetical order based on column headings.
- Z-A: reverse alphabetical order based on column headings.
- Mapped First: mapped columns listed alphabetically before unmapped columns.
- Unmapped First: unmapped columns listed alphabetically before mapped columns.
The following headings appear under color-coded progress bars indicating what percentage of the imported data was accepted: Dimensions, Attributes, and Measures. If a progress bar is not 100 percent blue, select the heading for more information on the mapping progress and a listing of unmapped entities.
Mapping imported data into an existing model:
- Determine which unmapped imported columns can be associated with specific transparent borderless cards in the card view.
- Drag and drop an unmapped imported column on the associated card.NoteIf you need to undo a mapping, hover over the card and select the (Cancel) button.
- Map the remaining imported columns.NoteThe final import cannot be started until all dimensions have been mapped to a corresponding card. You also need to map at least one imported measure. If there are fewer cards than imported columns, any extra columns will be ignored.NoteCheck Fiscal Year in Details to map imported fiscal period data into a date dimension in a model enabled to support fiscal year. Review the format listed under Fiscal Year Format and change it if required.
- To map an unmapped location dimension, select the corresponding card and from Map LocationsGeo by Area Name if your locations are based on area data, or the to display either the Map Latitude/Longitude dialog if your locations are based on coordinates. For coordinates, use the Latitude and Longitude fields to select corresponding columns. If your locations are based on area, select a column corresponding to Region and another for Sub-Region 1. choose
- Under Mapping Options select View all
options to choose any of the following options:
- Update dimension with new values: Select this
option if you want to add and update dimension members:
- New members in the dimension are added to the model.
- Changes to the description or properties of existing members are updated in the model.
- Changes to a hierarchy that has been previously defined are updated in the model.
Restriction- Dimension members can be updated if the dimension is private (i.e. not shared in multiple models). Updates to public dimensions are not supported.
- Account dimension members are updated only if a column in the imported dataset is mapped to that account dimension.
- Although changes to existing properties are updated, new properties are not added.
- Convert value symbol by account type: Choose how accounting data is handled in this import. Special data types for accounting data are available. These automatically apply negative signs to values in the database as necessary, although the data shown on screen is shown as a positive value (this applies to the types INC and LEQ – refer to the Account dimension type for more information). When this option is selected, the value symbol is matched to the account type. For example, expenses are negative values and revenues are positive values. See Attributes of an Account Dimension and Preparing Data to display either the for details.
- Fill applicable empty ID cells with the "#" value: When creating a new model, if member IDs are empty, they are automatically filled with the “#” value if you select this option. Otherwise, those rows are omitted from the model.
- Update dimension with new values: Select this
option if you want to add and update dimension members:
- Select the Import Method to determine how existing data
is handled.
A choice of three import methods is available.
NoteThese options affect measures and dimensions. To include both measures and dimensions, see Updating and Scheduling Models.Update Updates the existing data and adds new entries to the target model. The scope of this update is based on a combination of all dimensions in the target model. For a more refined scope, use either the Clean and replace selected version data or Clean and replace subset of data update options. Append Keeps the existing data as is and adds new entries to the target model. NoteWhen importing data into a planning model using the Append option, you can choose between aggregating duplicated rows, or rejecting them.Clean and replace selected version data Deletes the existing data and adds new entries to the target model, only for the versions that you specify in the import. You can choose to use either the existing version or specify a new version under Version. If you specify to import data for the "actual" version, only the data in the "actual" version is cleaned and replaced. Other versions, for example "planning", are not affected. Clean and replace subset of data Replaces existing data and adds new entries to the target model for a defined subset of the data based on a scope of selected versions using either the Existing Version or New Version buttons. You can also limit the scope to specific dimensions. To define a scope based on a combination of dimensions, select + Add Scope and use the Select a dimension field to specify a dimension.
When a Date dimension is defined in the scope, the time range in the source data (calculated based on the target model's granularity) combined with other dimensions in the scope will determine what existing data is cleaned and replaced.If for example, Date and Region dimensions are defined as part of a scope, only entries that fall within the time range and match Region from the source data will be replaced in the target model. Existing data that does not match the scope will be kept as is.NoteBlank dates (allowed in non-Planning models) are not part of the time range: They are matched directly in order to clean and replace data with blank dates in addition to data within the time range. - Choose which Category you want to use for the imported data.
- You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. On the toolbar, select (Level Based Hierarchy). For more information see Creating a Level-Based Hierarchy.
For details about storing data in non-leaf members, see Entering Values with Multiple Hierarchies.
- Under Transformations, select the
(Combine Data) icon.
The Let's add some data! dialog is displayed.
- Choose how you want to add data:
- Import a file.
- Acquire data from a data source.
Once the new data is loaded, the Combine Data dialog is displayed. Under Combine Settings, a table representing all columns in the original data is displayed on the left, and a table listing columns from the new dataset is on the right.
- Choose the most appropriate columns in each table to combine data. Click to
select, or drag columns to the field provided under Combine
Column for each table. NoteThe columns should be the best match and should not contain duplicates. You can use up to three columns to join the two datasets.
- Select how you want to combine the columns:
- All primary data: maintains the non-intersecting data from the original dataset.
- Intersecting data only: omits any data from the original dataset that does not match in the new data.
The results for the column combination in terms of Accepted, Duplicated, Nulls, and Omitted values are provided under Combine Preview. A sample of combined data rows is displayed under Data Sample.
- Select Combine when you are ready to combine the two
tables. NoteCurrently, when you combine data:
- The number of combined cells cannot exceed 30,000,000.
- The number of combined rows cannot exceed 1,000,000.
- The number of combined columns cannot exceed 100.
- You cannot run more than ten combine data processes on a given dataset.
- You cannot combine two datasets using a calculated column.
- You cannot combine two datasets in a story.
- Once the acquired data (or sample) is loaded in the data integration view,
select the icon from the
Actions menu.
The Transpose Columns into Rows panel appears in Details.
- Under Transpose Columns select columns you want to include in the new table format. Choose Select All to include all the listed columns.
- Set Preview on grid on to preview the new format.
- Select OK to implement the transform and to
actualize the new format.NoteTo undo the transformation, open the Transformation Log and delete the associated entry.
- If you are creating a new model, in the Model level: review the model name, change it if desired, and enter an optional description.
- To edit column names, double-click the column headers, and provide the new name.
- To define your dimensions when creating a new model, select a column in the
Column view, and expand
Modeling to specify any of the following dimension
types for the column:
- Generic
- Measure
- Date
- Timestamp
- Organization
- Account
- Version
If version data is present in your source file, select the column that contains the version. Set the Modeling type to Version to display an additional Version Mapping feature available in the Column tab. Select the Map Versions button to open an additional dialog where you can map a dimension category (such as Actuals or Forecast) to the version of the data being imported.NoteThe dimension types Organization, Account, and Version are not exposed in a model embedded in a story. - When creating a new model, you can select Add Dimension Attributes to
define other columns as attributes for the selected dimension.Note
- A dimension can have only one Description attribute, but several Property attributes. Custom dimension properties are not exposed in a model embedded in a story.
- A dimension ID member cannot have two different attribute values (for example, two different descriptions), but a single attribute value can have multiple IDs. For example, it's possible for two rows to have different IDs, but the same description.
- Currency is an attribute of an Organization dimension type. For more information on each dimension type, see About Dimensions.
- To create a location dimension column for a new model, you need to specify what
type of location dimension you want to include – location by region or location
by coordinates.
- Select the
(Geo Enrichment) icon under
Transformations, and choose either
Location by Region or Location by
Coordinates. Select columns to associate with
longitude/latitude or region/sub-region. A location dimension column is
created. When you use the Location by Coordinates option, you do not need to select a column to serve as location identifier. As you prepare the data, if you modify any of the coordinate data used to create the location dimension, the location dimension values will automatically reflect your modifications.NoteIf the coordinates or regions you have selected cannot be automatically resolved or are invalid, the issues will be listed under Data Quality.
- Select the
(Geo Enrichment) icon under
Transformations, and choose either
Location by Region or Location by
Coordinates. Select columns to associate with
longitude/latitude or region/sub-region. A location dimension column is
created.
- For measures, you can specify a Label (for example, cartons,
kilograms, etc.), Scale, and Decimal
Places.NoteFor a detailed description of these attributes, see Attributes of an Account Dimension.
- You should resolve any items that are listed in the Details panel under either the Data Quality or Model Requirements when creating a new model, or Mapping Quality if you are adding new data to an existing model. For more information, see the "Data Quality Considerations" section in About Preparing Data.
- Select a message to see the options available to resolve any identified quality issue. Use
the context-sensitive editing features and the transform bar to edit data in a
cell, column, or card. When selecting a column, a cell, or content in a cell, a
menu will appear with options for performing transforms. There are two parts to
this menu:
- Choose the Quick Actions option to perform actions such as duplicate column, trim whitespace, hide columns, or delete columns or rows. The table below lists all the available actions.
- Select the (Smart
Transformations) icon to list suggested transformations
to apply to the column, such as replacing the value in a cell with a
suggested value. You can also select Create a
Transform and choose from the options listed under the
transformation bar displayed above the columns. The transformation bar
is used to edit and create transforms.Note
When creating a new model, you can run Quick Actions and Smart Transformations directly in the Card View on a selected card.
As you hover over a suggested transformation, the anticipated results are previewed in the grid. To apply the transformation, simply select the transform. You can manually enter your own transformation in the transformation bar; as the transform is built, a preview is provided in the affected column, cell, or content within the cell. The table below summarizes the available transformations you can apply to selected columns, cells, or content within cells.
Transformation Description Transform Bar Format Delete Rows Delete rows in the data, either by selecting individual members, or by specifying a range (not possible for text columns; only numerical and Date columns). Provided only as a quick action Trim Whitespace Remove spaces, including non-printing characters, from the start and end of strings. Provided only as a quick action Duplicate Column Create a copy of an existing column. Provided only as a quick action Delete Column Delete a column. Use the Shift key to select and then delete multiple columns. Provided only as a quick action Remove duplicate rows Remove duplicate rows when creating or adding data to a model. Provided only as a task bar icon Concatenate Combine two or more columns into one. An optional value can be entered to separate the column values. Concatenate [<Column1>], [<Column2>]… using "value" Split Split a text column on a chosen delimiter, starting from left to right. The number of splits can be chosen by the user. Split [<Column>] on "delimiter" repeat "#" Extract Use this transform to extract a block of text specified as numbers, words, or targeted values within a column to a new column. Extract [<what to extract]>] [<where to extract>] [<which occurrence>] [""] from [<column name>] [<include value option> ]. Options for what to extract:
- number Limits extracted text to one number from column cell.
- word Limits extracted text to a word from column cell.
- everything Includes all text.
Options for where to extract relative to the target value:- before
- after
- betweenNoteYou must specify two target values when using between.
- containing: Extracts a word or number containing the target. For example, if your target is ship, both ship and shipping will be extracted.
- equal to: Extracts the specific target value if it exists in the cell.
Options for specifying occurrence:- first
- last
- occurrence: Allows you to specify
the position of the target from one to ten. NoteUse occurrence when there are multiple instances of the target.
To extract the first number from all column cells, you would specify the following:
Extract before first "" from [<column name>].To extract all text between parenthesis in column cells, you would specify:
Extract everything between "(" and ")" from [<column name>].Change Change a column to uppercase, lowercase, or title case. Change [<Column>] to (<UPPERCASE>/<lowercase>/<TitleCase>) Replace Replaces either an entire cell or content that could be found in multiple different cells. NoteYou can optionally select to add a <where> clause to the transform bar. You need to specify an associated column and value when using a <where> clause to limit the replace action in a given column.Replace (<cell/content>) in [<Column>] matching "value" with "value" With a <where> clause: Replace (<cell/content>) in [<Column>] matching "value" with "value" where[<Column>]is "value"
- A history of all transforms you implement is displayed in Transform Log. There are two levels of transformation logs recorded: transforms on the entire dataset, and transformations on the currently selected column. Hover over a specific transform in the Transform Log to highlight the impacted column. You can roll back the change by either deleting the entries in the history or using the (Undo/Redo) buttons on the toolbar. You can remove transforms in the Transform Log panel out of sequential order provided that there are no dependencies.
- a. If you're working with a dataset sample, choose Validate
Data to apply your transforms across the entire dataset and
check for data quality errors. NoteAny newly detected errors will be retrieved in the sample you're working on, and appear under the Quality Issues or Model Requirements areas in the Details panel. You should resolve these errors before proceeding.
- b. Choose either Create Model to import the entire dataset and create
your new model, or if you're adding data to a model or story, choose
Finish Mapping.NoteIf you choose to create the model without resolving detected errors, you'll be asked if you wish to continue. Depending on the types of errors, either entire rows or invalid cells will be omitted from the new model.NoteIf any rows in the dataset were rejected, you can select Download Rejected Rows to save the rejected rows as a .csv file. You can then examine this file to see which data was rejected. You could fix the data in the .csv file and then import the .csv file into the model using the Import Data workflow, or fix the data in the source system.NoteOnce you have saved the model, you can view the data mapping and transform history. For more information, see View Import Job Mapping and Transforms.