Preparing Data

After importing raw data into a new or existing model, or into a story, you may need to perform some data preparation.

For detailed background information about data preparation, see About Preparing Data.

Workflow context
Note
Depending on the workflow you are performing, some of the functionality described below will vary slightly.
Importing the data

Typically, when you import draft data into a new or existing model, the data acquisition will happen in the background while you work on something else. keep the above xref near the top of this topic, because some readers will arrive at this topic looking for explanatory info about wrangling, which the linked-to topic contains, and this current topic is so long that readers might miss the related link at the bottom.

Load the raw data following the steps in the help topic specific to your workflow and data source:

Note
If you import a large-volume dataset, you'll be informed that a data sample will be displayed rather than the entire dataset. Choose OK to continue.
Viewing the data imported to create a new model
During the initial data import, the data type of each column is inferred, as well as dimension type in the data integration view. In the right hand Details side panel, you will have a summary of the characteristics of the dataset.
  1. Under Layout, select the Card Viewkeep icon to display all the columns as separate cards. Each card includes the column name, a specific summary of the data corresponding to its type (mean value for numbers, date ranges for dates, number of unique values for text, etc.), as well as a quality bar that shows in green the proportion of values formatted correctly – those in red formatted incorrectly and require action, and those in gray are empty.

    You can sort the displayed cards in original, ascending, or descending order, or according to the model structure (listing all measures first, followed by dimensions). If you switch to (Grid View), the columns associated with your selected cards are automatically highlighted.

    Note
    Use the (Show/Hide) icon to limit the view to a few selected columns in either (Grid View) or Card View. A drop-down list of all the columns in the uploaded data is displayed under (Show/Hide). All columns listed with a check mark are displayed in the current view. Select a column name to show or hide the column. You can use the search tool to look for a specific column. You can also hide columns by directly selecting columns in the data integration view, and choosing Start of the navigation path (More Actions) Next navigation step (Hide selected)End of the navigation path.
  2. The Details panel is divided into two sections:
    • Model level: shows general information about the imported data, including any data-quality issues including mapping issues, and model-specific information. A Model Options section is provided to enable planning, set the default currency for the model, plus other options.
    • Column level: allows you to map columns to existing dimensions in the model, displays histograms for the currently selected column (rectangles indicating frequency of data items in successive numerical intervals), lists the attribute and dimension types that you can apply to the column, and displays any data-quality issues represented as quality bars under Data Quality. The green section indicates correct values, while those in red indicate incorrect values. Those in gray are empty.
  3. When you select a column in (Grid View) or Card View, numerical or textual histograms are displayed under Data Distribution.
    • Text histograms are horizontal, and the values are clustered by count. The number of clusters can be adjusted by dragging the slider shown above the histogram. When a cluster contains more than one value, the displayed count is the average count for the cluster. The count is prefaced by a tilde symbol (~) if there are multiple different occurrences. Expand the cluster for a more detailed view of the values in the cluster along with individual counts. Use the search tool to look up specific column values, and press Enter to initiate the search. When you select a value in the histogram, the corresponding column is sorted and the value is highlighted in the grid.
      Note
      The displayed histogram is determined by the data type of the column. A column with numbers could still be considered as text if the data type is set to text.
    • Numerical histograms are vertical, and represent the range of values along the x-axis. Hover over any bar to show the count, minimum, and maximum values for the data in the bar. The number of bars can also be adjusted by using the slider above the histogram. Toggling the Show Outliers box includes or removes outliers from the histogram.
      Note
      Below the histogram, there is a box and whisker plot to visualize the histogram's distribution of values.
  4. You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. In the toolbar, select (Level Based Hierarchy). For more information, see Creating a Level-Based Hierarchy.
  5. Select any of the following Model Options:
    • Specify whether you want the model to be Planning Enabled. For information about model types, see Planning Models and Analytic Models.
    • Specify whether you want empty ID cells to be filled with automatically generated values. If you clear this check box, rows with empty ID cells will be omitted.
Note
Data that includes a date dimension needs to be imported using the calendar year for the initial model creation. Once you enable the fiscal year setting, then you can later import and map data stored in fiscal years.
Viewing and mapping data imported to an existing model

The Card View displays after you import data into an existing model. By default, cards representing dimensions and measures are displayed. Double-click a card to display attribute cards under the selected card.

Cards display as either mapped or unmapped entities. A card represents mapped data if it is shaded solid and has defined borders.

A card represents unmapped data if it does not have a border and appears transparent.

A red dot in the top right corner of the card indicates there are mapping errors that need to be addressed. A blue dot indicates that new values have been added from the import to the existing data.

Cards display sample values and icons for specific data types. The table below summarizes the available data typology:
Icon

Data Type

Generic dimension
Date dimension
Account dimension
Measure
Version dimension
Organization dimension
Attribute only available for organization dimension
Description or property attributes (for example, unit, scale, or decimal)
Timestamp dimension
Hierarchy attribute

When you select a card, the Details panel is populated with information specific to your selection – the number of rows, unique values, data types, and corresponding histograms. A pie chart is provided under Mapping Quality showing the proportion of Accepted versus Rejected values.

To the left of the card view, a panel containing the imported data columns is displayed. Use any of the following sort options:
  • Original Order: reflects the original order in the imported data source.
  • A-Z: alphabetical order based on column headings.
  • Z-A: reverse alphabetical order based on column headings.
  • Mapped First: mapped columns listed alphabetically before unmapped columns.
  • Unmapped First: unmapped columns listed alphabetically before mapped columns.
You can use the search tool to look for a specific column. When you select an entry in the imported column panel, the Details panel is populated with information specific to your selection: the number of rows, unique values, data types, and corresponding histograms. Use this information to determine what kind of data is contained in the imported column, and this information will be invaluable for mapping purposes.

The following headings appear under color-coded progress bars indicating what percentage of the imported data was accepted: Dimensions, Attributes, and Measures. If a progress bar is not 100 percent blue, select the heading for more information on the mapping progress and a listing of unmapped entities.

Mapping imported data into an existing model:

  1. Determine which unmapped imported columns can be associated with specific transparent borderless cards in the card view.
  2. Drag and drop an unmapped imported column on the associated card.
    Note
    If you need to undo a mapping, hover over the card and select the (Cancel) button.
  3. Map the remaining imported columns.
    Note
    The final import cannot be started until all dimensions have been mapped to a corresponding card. You also need to map at least one imported measure. If there are fewer cards than imported columns, any extra columns will be ignored.
    Note
    Check Fiscal Year in Details to map imported fiscal period data into a date dimension in a model enabled to support fiscal year. Review the format listed under Fiscal Year Format and change it if required.
  4. To map an unmapped location dimension, select the corresponding card and from Start of the navigation pathDetails Next navigation step  LocationEnd of the navigation path choose Map LocationsGeo by Area Name if your locations are based on area data, or the to display either the Map Latitude/Longitude dialog if your locations are based on coordinates. For coordinates, use the Latitude and Longitude fields to select corresponding columns. If your locations are based on area, select a column corresponding to Region and another for Sub-Region 1.
  5. Under Mapping Options select View all options to choose any of the following options:
    • Update dimension with new values: Select this option if you want to add and update dimension members:
      • New members in the dimension are added to the model.
      • Changes to the description or properties of existing members are updated in the model.
      • Changes to a hierarchy that has been previously defined are updated in the model.
      Restriction
      • Dimension members can be updated if the dimension is private (i.e. not shared in multiple models). Updates to public dimensions are not supported.
      • Account dimension members are updated only if a column in the imported dataset is mapped to that account dimension.
      • Although changes to existing properties are updated, new properties are not added.
    • Convert value symbol by account type: Choose how accounting data is handled in this import. Special data types for accounting data are available. These automatically apply negative signs to values in the database as necessary, although the data shown on screen is shown as a positive value (this applies to the types INC and LEQ – refer to the Account dimension type for more information). When this option is selected, the value symbol is matched to the account type. For example, expenses are negative values and revenues are positive values. See Attributes of an Account Dimension and Preparing Data to display either the for details.
    • Fill applicable empty ID cells with the "#" value: When creating a new model, if member IDs are empty, they are automatically filled with the “#” value if you select this option. Otherwise, those rows are omitted from the model.
  6. Select the Import Method to determine how existing data is handled.

    A choice of three import methods is available.

    Note
    These options affect measures and dimensions. To include both measures and dimensions, see Updating and Scheduling Models.
    Update Updates the existing data and adds new entries to the target model. The scope of this update is based on a combination of all dimensions in the target model. For a more refined scope, use either the Clean and replace selected version data or Clean and replace subset of data update options.
    Append Keeps the existing data as is and adds new entries to the target model.
    Note
    When importing data into a planning model using the Append option, you can choose between aggregating duplicated rows, or rejecting them.
    Clean and replace selected version data Deletes the existing data and adds new entries to the target model, only for the versions that you specify in the import. You can choose to use either the existing version or specify a new version under Version. If you specify to import data for the "actual" version, only the data in the "actual" version is cleaned and replaced. Other versions, for example "planning", are not affected.
    Clean and replace subset of data

    Replaces existing data and adds new entries to the target model for a defined subset of the data based on a scope of selected versions using either the Existing Version or New Version buttons. You can also limit the scope to specific dimensions. To define a scope based on a combination of dimensions, select + Add Scope and use the Select a dimension field to specify a dimension.

    When a Date dimension is defined in the scope, the time range in the source data (calculated based on the target model's granularity) combined with other dimensions in the scope will determine what existing data is cleaned and replaced.
    Note
    Blank dates (allowed in non-Planning models) are not part of the time range: They are matched directly in order to clean and replace data with blank dates in addition to data within the time range.
    If for example, Date and Region dimensions are defined as part of a scope, only entries that fall within the time range and match Region from the source data will be replaced in the target model. Existing data that does not match the scope will be kept as is.
  7. Choose which Category you want to use for the imported data.
  8. You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. On the toolbar, select (Level Based Hierarchy). For more information see Creating a Level-Based Hierarchy.
Note
For planning models, if you want to omit validation for specific hierarchies, to allow non-leaf members to contain fact data, click Select hierarchies in the Conditional Validation section of the Details panel, and then select the hierarchies that you want to omit from validation.

For details about storing data in non-leaf members, see Entering Values with Multiple Hierarchies.

Combining data with your acquired data
You can combine data from another source with your acquired data by using up to three matching columns to join the two datasets.
Note
This workflow does not apply to a model embedded in a story.
  1. Under Transformations, select the (Combine Data) icon.

    The Let's add some data! dialog is displayed.

  2. Choose how you want to add data:
    • Import a file.
    • Acquire data from a data source.

    Once the new data is loaded, the Combine Data dialog is displayed. Under Combine Settings, a table representing all columns in the original data is displayed on the left, and a table listing columns from the new dataset is on the right.

  3. Choose the most appropriate columns in each table to combine data. Click to select, or drag columns to the field provided under Combine Column for each table.
    Note
    The columns should be the best match and should not contain duplicates. You can use up to three columns to join the two datasets.
  4. Select how you want to combine the columns:
    • All primary data: maintains the non-intersecting data from the original dataset.
    • Intersecting data only: omits any data from the original dataset that does not match in the new data.

    The results for the column combination in terms of Accepted, Duplicated, Nulls, and Omitted values are provided under Combine Preview. A sample of combined data rows is displayed under Data Sample.

  5. Select Combine when you are ready to combine the two tables.
    Note
    Currently, when you combine data:
    • The number of combined cells cannot exceed 30,000,000.
    • The number of combined rows cannot exceed 1,000,000.
    • The number of combined columns cannot exceed 100.
    • You cannot run more than ten combine data processes on a given dataset.
    • You cannot combine two datasets using a calculated column.
    • You cannot combine two datasets in a story.
Importing cross table data
When you upload acquired data containing cross tables to the data integration view, the Transpose Columns into Rows action enables you to change selected columns into rows. The new table format can then be used to either import data or create a new model.
Note
You can only perform one transpose transformation per session on a dataset. The table resulting from the transformation cannot contain more than 50 million cells.
  1. Once the acquired data (or sample) is loaded in the data integration view, select the icon from the Actions menu.

    The Transpose Columns into Rows panel appears in Details.

  2. Under Transpose Columns select columns you want to include in the new table format. Choose Select All to include all the listed columns.
  3. Set Preview on grid on to preview the new format.
  4. Select OK to implement the transform and to actualize the new format.
    Note
    To undo the transformation, open the Transformation Log and delete the associated entry.
Improving data quality
  1. If you are creating a new model, in the Model level: review the model name, change it if desired, and enter an optional description.
  2. To edit column names, double-click the column headers, and provide the new name.
  3. To define your dimensions when creating a new model, select a column in the Column view, and expand Modeling to specify any of the following dimension types for the column:
    • Generic
    • Measure
    • Date
    • Timestamp
    • Organization
    • Account
    • Version
    Note
    The dimension types Organization, Account, and Version are not exposed in a model embedded in a story.
    If version data is present in your source file, select the column that contains the version. Set the Modeling type to Version to display an additional Version Mapping feature available in the Column tab. Select the Map Versions button to open an additional dialog where you can map a dimension category (such as Actuals or Forecast) to the version of the data being imported.
  4. When creating a new model, you can select Add Dimension Attributes to define other columns as attributes for the selected dimension.
    Note
    • A dimension can have only one Description attribute, but several Property attributes. Custom dimension properties are not exposed in a model embedded in a story.
    • A dimension ID member cannot have two different attribute values (for example, two different descriptions), but a single attribute value can have multiple IDs. For example, it's possible for two rows to have different IDs, but the same description.
    • Currency is an attribute of an Organization dimension type. For more information on each dimension type, see About Dimensions.
  5. To create a location dimension column for a new model, you need to specify what type of location dimension you want to include – location by region or location by coordinates.
    • Select the (Geo Enrichment) icon under Transformations, and choose either Location by Region or Location by Coordinates. Select columns to associate with longitude/latitude or region/sub-region. A location dimension column is created.
      Note
      If the coordinates or regions you have selected cannot be automatically resolved or are invalid, the issues will be listed under Data Quality.
      When you use the Location by Coordinates option, you do not need to select a column to serve as location identifier. As you prepare the data, if you modify any of the coordinate data used to create the location dimension, the location dimension values will automatically reflect your modifications.
  6. For measures, you can specify a Label (for example, cartons, kilograms, etc.), Scale, and Decimal Places.
    Note
    For a detailed description of these attributes, see Attributes of an Account Dimension.
  7. You should resolve any items that are listed in the Details panel under either the Data Quality or Model Requirements when creating a new model, or Mapping Quality if you are adding new data to an existing model. For more information, see the "Data Quality Considerations" section in About Preparing Data.
  8. Select a message to see the options available to resolve any identified quality issue. Use the context-sensitive editing features and the transform bar to edit data in a cell, column, or card. When selecting a column, a cell, or content in a cell, a menu will appear with options for performing transforms. There are two parts to this menu:
    • Choose the Quick Actions option to perform actions such as duplicate column, trim whitespace, hide columns, or delete columns or rows. The table below lists all the available actions.
    • Select the (Smart Transformations) icon to list suggested transformations to apply to the column, such as replacing the value in a cell with a suggested value. You can also select Create a Transform and choose from the options listed under the transformation bar displayed above the columns. The transformation bar is used to edit and create transforms.
      Note

      When creating a new model, you can run Quick Actions and Smart Transformations directly in the Card View on a selected card.

    As you hover over a suggested transformation, the anticipated results are previewed in the grid. To apply the transformation, simply select the transform. You can manually enter your own transformation in the transformation bar; as the transform is built, a preview is provided in the affected column, cell, or content within the cell. The table below summarizes the available transformations you can apply to selected columns, cells, or content within cells.

    Transformation Description Transform Bar Format
    Delete Rows Delete rows in the data, either by selecting individual members, or by specifying a range (not possible for text columns; only numerical and Date columns). Provided only as a quick action
    Trim Whitespace Remove spaces, including non-printing characters, from the start and end of strings. Provided only as a quick action
    Duplicate Column Create a copy of an existing column. Provided only as a quick action
    Delete Column Delete a column. Use the Shift key to select and then delete multiple columns. Provided only as a quick action
    Remove duplicate rows Remove duplicate rows when creating or adding data to a model. Provided only as a task bar icon
    Concatenate Combine two or more columns into one. An optional value can be entered to separate the column values. Concatenate [<Column1>], [<Column2>]… using "value"
    Split Split a text column on a chosen delimiter, starting from left to right. The number of splits can be chosen by the user. Split [<Column>] on "delimiter" repeat "#"
    Extract Use this transform to extract a block of text specified as numbers, words, or targeted values within a column to a new column. Extract [<what to extract]>] [<where to extract>] [<which occurrence>] [""] from [<column name>] [<include value option> ].

    Options for what to extract:

    • number Limits extracted text to one number from column cell.
    • word Limits extracted text to a word from column cell.
    • everything Includes all text.
    Options for where to extract relative to the target value:
    • before
    • after
    • between
      Note
      You must specify two target values when using between.
    • containing: Extracts a word or number containing the target. For example, if your target is ship, both ship and shipping will be extracted.
    • equal to: Extracts the specific target value if it exists in the cell.
    Options for specifying occurrence:
    • first
    • last
    • occurrence: Allows you to specify the position of the target from one to ten.
      Note
      Use occurrence when there are multiple instances of the target.

    To extract the first number from all column cells, you would specify the following:

    Extract before first "" from [<column name>].

    To extract all text between parenthesis in column cells, you would specify:

    Extract everything between "(" and ")" from [<column name>].
    Change Change a column to uppercase, lowercase, or title case. Change [<Column>] to (<UPPERCASE>/<lowercase>/<TitleCase>)
    Replace Replaces either an entire cell or content that could be found in multiple different cells.
    Note
    You can optionally select Start of the navigation path Next navigation step whereEnd of the navigation path to add a <where> clause to the transform bar. You need to specify an associated column and value when using a <where> clause to limit the replace action in a given column.
    Replace (<cell/content>) in [<Column>] matching "value" with "value"

    With a <where> clause: Replace (<cell/content>) in [<Column>] matching "value" with "value" where[<Column>]is "value"

  9. A history of all transforms you implement is displayed in Transform Log. There are two levels of transformation logs recorded: transforms on the entire dataset, and transformations on the currently selected column. Hover over a specific transform in the Transform Log to highlight the impacted column. You can roll back the change by either deleting the entries in the history or using the (Undo/Redo) buttons on the toolbar. You can remove transforms in the Transform Log panel out of sequential order provided that there are no dependencies.
Completing data preparation
Once you've finished preparing your data and have resolved all data or model quality issues, you're ready to create the new model or finish mapping the data you're adding to the model or story.
Note
When embedding a model in a story, there are no Create Model or Finish Mapping options. Once you switch from Data to Story or Data Exploration view, the model is automatically embedded in the story.
  1. a. If you're working with a dataset sample, choose Validate Data to apply your transforms across the entire dataset and check for data quality errors.
    Note
    Any newly detected errors will be retrieved in the sample you're working on, and appear under the Quality Issues or Model Requirements areas in the Details panel. You should resolve these errors before proceeding.
  2. b. Choose either Create Model to import the entire dataset and create your new model, or if you're adding data to a model or story, choose Finish Mapping.
    Note
    If you choose to create the model without resolving detected errors, you'll be asked if you wish to continue. Depending on the types of errors, either entire rows or invalid cells will be omitted from the new model.
    Note
    If any rows in the dataset were rejected, you can select Download Rejected Rows to save the rejected rows as a .csv file. You can then examine this file to see which data was rejected. You could fix the data in the .csv file and then import the .csv file into the model using the Import Data workflow, or fix the data in the source system.
    Note
    Once you have saved the model, you can view the data mapping and transform history. For more information, see View Import Job Mapping and Transforms.