Modeling Guide

Numeric Generalization

Output numeric ranges into groups.

Use numeric generalization when you want to place a range of numeric data into a group. For example, you might want to define a group by ages from 0 to 100 by groups of 10 years. You can implement the groups in one of two ways: define your group with a defined range, or define the groups one at a time.

Define a Group with a Defined Range

In this method, you define a starting and ending numeric range and choose how the dates should be grouped. Then you can enter a masked value for the defined ranges.

  1. Select a column and click the wrench icon. Choose Numeric Generalization.
  2. Select the link, defining a group.
  3. In the Grouping dialog, enter the Start Number and End Number.
  4. In the By option, enter the number for how you want the data grouped. For example, if you are dividing the ages from a range of 0-100, and you want a group for each decade, you would choose 10 .
  5. Click Create. The groups are created.
  6. Enter a masked value for each group. Continuing with the age example, you might enter child, teen, 20's, 30's, and so on. By default, the values are greater than or equal to (<=), and less than or equal to (>=). You can change these inclusive values to exclude either the beginning or ending dates by choosing greater than (<) and less than (>) symbols.
  7. Enter a Default Masked Value to place a masked value on any numbers that are not defined in the groups. For example, if you have some people who are over 100 years old, you might enter a Default Masked Value of 100+.
  8. Click Save, and then Apply to return to the flowgraph editor

Define a Group One at a Time

In this method, you define and name each group individually. For example, if you want to create groups from an unequal group of numbers, it might be easier to set them up one at a time. Perhaps you want the following groups to target age-appropriate markets:
  • 18-24
  • 25-34
  • 35-44
  • 45-65
  1. Select a column and click the wrench icon. Choose Numeric Generalization.
  2. Click the + icon.
  3. Enter a minimum value, and then choose whether you want the specified number included in the results (greater than or equal symbol <=), or to start with the following number (greater than symbol <).
  4. Enter the maximum value, and then choose whether to include the specified number in the results (less than or equal symbol >=), or to end on the previous day (less than symbol >).
  5. Enter a masked value for each group.
  6. Click the + icon to add more groups.
  7. Enter a Default Masked Value to place a masked value on any numbers that are not defined in the groups. For example, if you have ages under 18 or over 65, you might enter a Default Masked Value as DoNotMarket.
  8. Click Save, and then Apply to return to the flowgraph editor.

Example

Let's say that you want to categorize your employee numbers by location. Rather than outputting the employee number, the defined masked values replace the value in the EMPNO column. Any numbers that do not fall into the range have the default masked value, "Outliers".
Minimum Column Name Maximum Masked Value
120000 <= EMPNO <= 169999 Asia_Pac
170000 <= EMPNO <= 299999 Europe_Africa
230000 <= EMPNO <= 289999 Americas