Preparing Data

After importing raw data into a new or existing model, or into a story, you may need to perform some data preparation.

For detailed background information about data preparation, see About Preparing Data.

Workflow context
Note
Depending on the workflow you are performing, some of the functionality described below will vary slightly.
Importing the data

Typically, when you import draft data into a new or existing model, the data acquisition will happen in the background while you work on something else.

Load the raw data following the steps in the help topic specific to your workflow and data source:

Note
If you import a large-volume dataset, you'll be informed that a data sample will be displayed rather than the entire dataset. Choose OK to continue.
Viewing the data imported to create a new model
During the initial data import, the data type of each column is inferred, as well as dimension type in the Data Integration view. In the right hand Details side panel, you will have a summary of the characteristics of the dataset.
  1. Under Mode, select the Card View icon to display all the columns as separate cards. Each card includes the column name, a specific summary of the data corresponding to its type (mean value for numbers, date ranges for dates, number of unique values for text, etc.), as well as a quality bar that shows in green the proportion of values formatted correctly – those in red formatted incorrectly and require action, and those in gray are empty.

    You can sort the displayed cards in original, ascending, or descending order, or according to the model structure (listing all measures first, followed by dimensions). If you switch to (Table View), the columns associated with your selected cards are automatically highlighted.

    Note
    Use the (Show/Hide) icon to limit the view to a few selected columns in either (Table View) or Card View. A drop-down list of all the columns in the uploaded data is displayed under (Show/Hide). All columns listed with a check mark are displayed in the current view. Select a column name to show or hide the column. You can use the search tool to look for a specific column. You can also hide columns by directly selecting columns in the data integration view, and choosing Start of the navigation path (More Actions) Next navigation step (Hide selected)End of the navigation path.
  2. The Details panel is divided into two sections:
    • Model level: shows general information about the imported data, including any data-quality issues including mapping issues, and model-specific information. A Model Options section is provided to enable planning, set the default currency for the model, plus other options.
    • Column level: allows you to map columns to existing dimensions in the model, displays histograms for the currently selected column (rectangles indicating frequency of data items in successive numerical intervals), lists the attribute and dimension types that you can apply to the column, and displays any data-quality issues represented as quality bars under Data Quality. The green section indicates correct values, while those in red indicate incorrect values. Those in gray are empty.
      Note
      The displayed column icon reflects the column data type: .
  3. When you select a column in (Table View) or Card View, numerical or textual histograms are displayed under Data Distribution.
    • Text histograms are horizontal, and the values are clustered by count. The number of clusters can be adjusted by dragging the slider shown above the histogram. When a cluster contains more than one value, the displayed count is the average count for the cluster. The count is prefaced by a tilde symbol (~) if there are multiple different occurrences. Expand the cluster for a more detailed view of the values in the cluster along with individual counts. Use the search tool to look up specific column values, and press Enter to initiate the search. When you select a value in the histogram, the corresponding column is sorted and the value is highlighted in the grid.
      Note
      The displayed histogram is determined by the data type of the column. A column with numbers could still be considered as text if the data type is set to text.
    • Numerical histograms are vertical, and represent the range of values along the x-axis. Hover over any bar to show the count, minimum, and maximum values for the data in the bar. The number of bars can also be adjusted by using the slider above the histogram. Toggling the Show Outliers box includes or removes outliers from the histogram.
      Note
      Below the histogram, there is a box and whisker plot to visualize the histogram's distribution of values.
  4. You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. In the toolbar, select (Level Based Hierarchy). For more information, see Creating a Level-Based Hierarchy.
  5. Select any of the following Model Options:
    • Specify whether you want the model to be Planning Enabled. For information about model types, see Planning Models and Analytics Models.
    • Specify whether you want empty ID cells to be filled with automatically generated values. If you clear this check box, rows with empty ID cells will be omitted.
Viewing and mapping data imported to an existing model

The Card View displays after you import data into an existing model. By default, cards representing dimensions and measures are displayed. Double-click a card to display attribute cards under the selected card.

Cards display as either mapped or unmapped entities. A card represents mapped data if it is shaded solid and has defined borders.

A card represents unmapped data if it does not have a border and appears transparent.

A red dot in the top right corner of the card indicates there are mapping errors that need to be addressed. A blue dot indicates that new values have been added from the import to the existing data.

Cards display sample values and icons for specific data types. The table below summarizes the available data typology:
Icon

Data Type

Generic dimension
Date dimension
Account dimension
Measure
Version dimension
Organization dimension
Attribute only available for organization dimension
Description or property attributes (for example, unit, scale, or decimal)
Timestamp dimension
Hierarchy attribute

When you select a card, the Details panel is populated with information specific to your selection – the number of rows, unique values, data types, and corresponding histograms. A pie chart is provided under Mapping Quality showing the proportion of Accepted versus Rejected values.

To the left of the card view, a panel containing the imported data columns is displayed. Use any of the following sort options:
  • Original Order: reflects the original order in the imported data source.
  • A-Z: alphabetical order based on column headings.
  • Z-A: reverse alphabetical order based on column headings.
  • Mapped First: mapped columns listed alphabetically before unmapped columns.
  • Unmapped First: unmapped columns listed alphabetically before mapped columns.
You can use the search tool to look for a specific column. When you select an entry in the imported column panel, the Details panel is populated with information specific to your selection: the number of rows, unique values, data types, and corresponding histograms. Use this information to determine what kind of data is contained in the imported column, and this information will be invaluable for mapping purposes.

The following headings appear under color-coded progress bars indicating what percentage of the imported data was accepted: Dimensions, Attributes, and Measures. If a progress bar is not 100 percent blue, select the heading for more information on the mapping progress and a listing of unmapped entities.

Mapping imported data into an existing model:

  1. Determine which unmapped imported columns can be associated with specific transparent borderless cards in the card view.
  2. Drag and drop an unmapped imported column on the associated card.
    Note
    If you need to undo a mapping, hover over the card and select the (Cancel) button.
  3. Map the remaining imported columns.
    Note
    The final import cannot be started until all dimensions have been mapped to a corresponding card. You also need to map at least one imported measure. If there are fewer cards than imported columns, any extra columns will be ignored.
    Note
    Check Fiscal Year in Details to map imported fiscal period data into a date dimension in a model enabled to support fiscal year. Review the format listed under Fiscal Year Format and change it if required.
  4. To map an unmapped location dimension, select the corresponding card and from Start of the navigation pathDetails Next navigation step  LocationEnd of the navigation path choose Map LocationsGeo by Area Name if your locations are based on area data, or the to display either the Map Latitude/Longitude dialog if your locations are based on coordinates. For coordinates, use the Latitude and Longitude fields to select corresponding columns. If your locations are based on area, select a column corresponding to Region and another for Sub-Region 1.
  5. Under Mapping Options select View all options to choose any of the following options:
    • Update dimension with new values: Select this option if you want to add and update dimension members:
      • New members in the dimension are added to the model.
      • Changes to the description or properties of existing members are updated in the model.
      • Changes to a hierarchy that has been previously defined are updated in the model.
      Restriction
      • Dimension members can be updated if the dimension is not shared among multiple models.
      • Account dimension members are updated only if a column in the imported dataset is mapped to that account dimension.
      • Although changes to existing properties are updated, new properties are not added.
    • Convert value symbol by account type: Choose how accounting data is handled in this import. Special data types for accounting data are available. These automatically apply negative signs to values in the database as necessary, although the data shown on screen is shown as a positive value (this applies to the types INC and LEQ – refer to the Account dimension type for more information). When this option is selected, the value symbol is matched to the account type. For example, expenses are negative values and revenues are positive values. See Attributes of an Account Dimension and Preparing Data to display either the for details.
    • Fill applicable empty ID cells with the "#" value: When creating a new model, if member IDs are empty, they are automatically filled with the “#” value if you select this option. Otherwise, those rows are omitted from the model.
  6. Select the Import Method to determine how existing data is handled.

    A choice of three import methods is available.

    Note
    These options affect measures and dimensions. To include both measures and dimensions, see Updating and Scheduling Models.
    Update Updates the existing data and adds new entries to the target model. The scope of this update is based on a combination of all dimensions in the target model. For a more refined scope, use either the Clean and replace selected version data or Clean and replace subset of data update options.
    Append Keeps the existing data as is and adds new entries to the target model.
    Clean and replace selected version data Deletes the existing data and adds new entries to the target model, only for the versions that you specify in the import. You can choose to use either the existing version or specify a new version under Version. If you specify to import data for the "actual" version, only the data in the "actual" version is cleaned and replaced. Other versions, for example "planning", are not affected.
    Clean and replace subset of data

    Replaces existing data and adds new entries to the target model for a defined subset of the data based on a scope of selected versions using either the Existing Version or New Version buttons. You can also limit the scope to specific dimensions. To define a scope based on a combination of dimensions, select + Add Scope and use the Select a dimension field to specify a dimension.

    When a Date dimension is defined in the scope, the time range in the source data (calculated based on the target model's granularity) combined with other dimensions in the scope will determine what existing data is cleaned and replaced.

    If for example, Date and Region dimensions are defined as part of a scope, only entries that fall within the time range and match Region from the source data will be replaced in the target model. Existing data that does not match the scope will be kept as is.
  7. Choose which Category you want to use for the imported data.
  8. You use level-based hierarchies when your data is organized into levels, such as Product Category, Product Group, and Product. When the data is displayed in a story, hierarchies can be expanded or collapsed. In the toolbar, select (Level Based Hierarchy). For more information see Creating a Level-Based Hierarchy.
Improving data quality
  1. If you are creating a new model, in the Model level: review the model name, change it if desired, and enter an optional description.
  2. To edit column names, double-click the column headers, and provide the new name.
  3. To define your dimensions when creating a new model, select a column in the Column view, and expand Modeling to specify any of the following dimension types for the column:
    • Dimension
    • Measure
    • Date
    • Timestamp
    • Organization
    • Account
    • Version

    If version data is present in your source file, select the column that contains the version. Set the Modeling type to Version to display an additional Version Mapping feature available in the Column tab. Select the Map Versions button to open an additional dialog where you can map a dimension category (such as Actuals or Forecast) to the version of the data being imported.

  4. When creating a new model you can select Add Dimension Attributes to define other columns as attributes for the selected dimension.
    Note
    • A dimension can have only one Description attribute, but several Property attributes.
    • A dimension ID member cannot have two different attribute values (for example, two different descriptions), but a single attribute value can have multiple IDs. For example, it's possible for two rows to have different IDs, but the same description.
    • Currency is an attribute of an Organization dimension type. For more information on each dimension type, see About Dimensions.
  5. To create a location dimension column for a new model, you need to specify what type of location dimension you want to include - location by region or location by coordinates.
    • Select the (Geo Enrichment) icon under Transformations, and choose either Location by Region or Location by Coordinates. Select columns to associate with longitude/latitude or region/sub-region. A location dimension column is created.
      Note
      If the coordinates or regions you have selected cannot be automatically resolved or are invalid, the issues will be listed under Data Quality.
      When you use the Location by Coordinates option you do not need to select a column to serve as location identifier. As you prepare the data, if you modify any of the coordinate data used to create the location dimension, the location dimension values will automatically reflect your modifications.
  6. For measures, you can add a unit label: for example, cartons, kilograms, etc.
  7. You should resolve any items that are listed in the Details panel under either the Data Quality or Model Requirements when creating a new model, or Mapping Quality if you are adding new data to an existing model. For more information, see the "Data Quality Considerations" section in About Preparing Data.
  8. Select a message to see the options available to resolve any identified quality issue. Use the context sensitive editing features and the transform bar to edit data in a cell, column, or card. When selecting a column, a cell, or content in a cell, a menu will appear with options for performing transforms. There are two parts to this menu:
    • Choose the Quick Actions option to perform actions such as duplicate column, trim whitespace, hide columns, or delete columns or rows. The table below lists all the available actions.
    • Select the (Smart Transformations) icon to list suggested transformations to apply to the column, such as replacing the value in a cell with a suggested value. You can also select Create a Transform and choose from the options listed under the transformation bar displayed above the columns. The transformation bar is used to edit and create transforms.
      Note

      When creating a new model, you can run Quick Actions and Smart Transformations directly in the Card View on a selected card.

    As you hover over a suggested transformation, the anticipated results are previewed in the grid. To apply the transformation, simply select the transform. You can manually enter your own transformation in the transformation bar; as the transform is built, a preview is provided in the affected column, cell, or content within the cell. The table below summarizes the available transformations you can apply to selected columns, cells, or content within cells.

    Transformation Description Transform Bar Format
    Delete Rows Delete rows in the data, either by selecting individual members, or by specifying a range (not possible for text columns; only numerical and Date columns). Provided only as a quick action
    Trim Whitespace Remove spaces, including non-printing characters, from the start and end of strings. Provided only as a quick action
    Duplicate Column Create a copy of an existing column. Provided only as a quick action
    Delete Column Delete a column. Use the Shift key to select and then delete multiple columns. Provided only as a quick action
    Concatenate Combine two or more columns into one. An optional value can be entered to separate the column values. Concatenate [<Column1>], [<Column2>]… using "value"
    Split Split a text column on a chosen delimiter, starting from left to right. The number of splits can be chosen by the user. Split [<Column>] on "delimiter" repeat "#"
    Extract Use this transform to extract a block of text specified as numbers, words, or targeted values within a column to a new column. Extract [<what to extract]>] [<where to extract>] [<which occurrence>] [""] from [<column name>] [<include value option> ].

    Options for what to extract:

    • number Limits extracted text to one number from column cell.
    • word Limits extracted text to a word from column cell.
    • everything Includes all text.
    Options for where to extract relative to the target value:
    • before
    • after
    • between
      Note
      You must specify two target values when using between.
    • containing: Extracts a word or number containing the target. For example, if your target is ship, both ship shipping will be extracted.
    • equal to: Extracts the specific target value if it exists in the cell.
    Options for specifying occurrence:
    • first
    • last
    • occurrence: Allows you to specify the position of the target from one to ten.
      Note
      Use occurrence when there are multiple instances of the target.

    To extract the first number from all column cells, you would specify the following:

    Extract before first "" from [<column name>].

    To extract all text between parenthesis in column cells, you would specify:

    Extract everything between "(" and ")" from [<column name>].
    Change Change a column to uppercase, lowercase, or title case. Change [<Column>] to (<UPPERCASE>/<lowercase>/<TitleCase>)
    Replace Replaces either an entire cell or content that could be found in multiple different cells.
    Note
    You can optionally select Start of the navigation path Next navigation step whereEnd of the navigation path to add a <where> clause to the transform bar. You need to specify an associated column and value when using a <where> clause to limit the replace action in a given column.
    Replace (<cell/content>) in [<Column>] matching "value" with "value"

    With a <where> clause: Replace (<cell/content>) in [<Column>] matching "value" with "value" where[<Column>]is "value"

  9. A history of all transforms you implement is displayed in History. Depending on your workflow and licensing, there are two ways to use the history:
    • Under History, there could be two levels of transformation logs recorded: transforms on the entire dataset, and transformations on the currently selected column. When reviewing All Transforms, hover over a specific transform to highlight the impacted column. You can roll back the change by either deleting the entries in the history or using the (Undo/Redo) buttons on the toolbar. You can remove transforms in the History panel out of sequential order provided that there are no dependencies. Note: Undo and Redo do not apply to modeling actions; only data transformations.
    • Under History, there will be only one log for all the transformations you have implemented if you are preparing data in a Story - you cannot isolate all transformations made on a specific column. In this scenario, to remove transforms use the (Undo/Redo) buttons in sequential order.
Completing data preparation

Once you've finished preparing your data and resolved all data or model quality issues, you're ready to create the new model or finish mapping the data you're adding to the model or story.

  1. a. If you're working with a dataset sample, choose Validate Data to apply your transforms across the entire dataset and check for data quality errors.
    Note
    Any newly detected errors will be retrieved in the sample you're working on, and appear under the Quality Issues or Model Requirements areas in the Details panel. You should resolve these errors before proceeding.
  2. b. Choose either Create Model to import the entire dataset and create your new model, or if you're adding data to a model or story, choose Finish Mapping.
    Note
    If you choose to create the model without resolving detected errors, you'll be asked if you wish to continue. Depending on the types of errors, either entire rows or invalid cells will be omitted from the new model.
    Note
    If any rows in the dataset were rejected, you can select Download Rejected Rows to save the rejected rows as a .csv file. You can then examine this file to see which data was rejected. You could fix the data in the .csv file and then import the .csv file into the model using the Import Data workflow, or fix the data in the source system.