Date Generalization
Output date ranges into groups.
Date generalization is useful when you want to place a range of date data into a group. For example, you might want to define a group of yearly, or monthly data. You might want to choose three months of data to define as a quarter, or ten years of data for a particular decade. However you want the data defined, you can implement the groups in one of three ways: define your group with a defined range, define the group with an automatic range, or define the groups one at a time.
Define a Group with a Defined Range
In this method, you define a starting and ending date range and choose how the dates should be grouped. Then you can enter a masked value for the defined ranges.
- Select a column and click the wrench icon. Choose Date Generalization.
- Select the link, defining a group.
- In the Grouping dialog, enter the Start Date and Ending Date by either typing the dates, or clicking the calender icon to find the date on the calendar.
- In the By option, enter the number for how you want the data grouped. For example, if you are dividing the year into quarters, you would choose 3 Months to create four groups, one for each quarter.
- Choose Day, Month, or Year to have the dates grouped by that option.
- Click Create. The groups are created.
- Enter a masked value for each group. Continuing with the four quarters example, you might enter Q1, Q2, Q3, and Q4. By default, the values are greater than or equal to, and less than or equal to. You can change these inclusive values to exclude either the beginning or ending dates.
- Enter a Default Masked Value. This places a masked value on any dates that are not defined in the groups. For example, if your four quarters were for the year 2017, and you have dates from 2016, you might enter a Default Masked Value to mask those records as 2016, or perhaps you want them listed as Undefined.
- Click Save, and then Apply to return to the flowgraph editor
Define a Group with an Auto Range
In this method, you do not name the groups, but define the duration, inclusive dates, and format.
- Select a column and click the wrench icon. Choose Date Generalization.
- Click Auto Range.
- Enter the amount of time you want to encompass by entering a number in the Duration option, and choosing either Year or Month.
- Enter the Start Date and Ending Date by either typing the dates, or clicking the calender icon to find the date on the calendar.
- Choose how you want the date value formatted in the Output Date Format option.
- Choose the character you want to separate the month, day, and year in the Output Date Delimiter option.
- Enter a Default Masked Value. This places a masked value on any dates that are not defined in the groups. For example, if you defined four quarters for the year 2017, and you have dates from 2016, you might enter a Default Masked Value to mask those records as 2016, or perhaps you want them listed as Undefined.
- Click Save, and then Apply to return to the flowgraph editor.
Define a Group One at a Time
In this method, you define and name each group individually.
- Select a column and click the wrench icon. Choose Date Generalization.
- Click the + icon.
- Enter the minimum value (start date), and then choose whether you want the date specified included in the results (greater than or equal symbol <=), or to start on the following day (greater than symbol <).
- Enter the maximum value (ending date), and then choose whether to include the date specified in the results (less than or equal symbol >=), or to end on the previous day (less than symbol >).
- Enter a masked value for each group.
- Click the + icon to add more groups.
- Enter a Default Masked Value. This places a masked value on any dates that are not defined in the groups. For example, if you defined four quarters for the year 2017, and you have dates from 2016, you might enter a Default Masked Value to mask those records as 2016, or perhaps you want them listed as Undefined.
- Click Save, and then Apply to return to the flowgraph editor.
Example
Let's say that you want to divide subscribers into groups based on their birth dates, and that you want to label the era rather than using the actual birth date. Because the age groups are not an equal number of years, it would be best to manually define each range one at a time rather than using the auto range. The default masked value is defined as "Out of Range", so that any records that do not belong in the defined ranges will output "Out of Range" as the masked value.| Minimum | Column Name | Maximum | Masked Value |
|---|---|---|---|
| 1940.01.01 <= | BIRTHDATE | <= 1964.12.31 | Baby Boomers |
| 1965.01.01 <= | BIRTHDATE | <= 1976.12.31 | Gen X |
| 1977.01.01 <= | BIRTHDATE | <= 1995.12.31 | Millennials |
