Anonymization helps to gain statistically valid insights from your data while protecting
the privacy of individuals.
When analyzing data, you must ensure privacy of personal or sensitive information. By removing
information that directly identifies an individual, such as a Social Security number
or a credit card number, you ensure a certain amount of privacy, but it could still
lead to an individual's re-identification. The Anonymization operator helps to
create anonymized groups, where you set the minimum number of records within the
group. You can also mask and generalize the data for a specified column. For
example, if you group individuals into age brackets of 20-29, 30-39, 40-49, and so
on, then it is more difficult to re-identify an individual. When you mask or
generalize multiple columns, then the ability to re-identifiy decreases.
As you use the Anonymization operator, you'll see the following terms:
Sensitive: data that most individuals do not want known
about them, for example, salary information or an illness.
Nonsensitive: data that most individuals may not mind
sharing, for example, the country they live in.
Identifier: data that directly identifies individuals
such as their name or Social Security number.
Quasi-identifier: data that indirectly identifies and
individual, especially when combined with other quasi-identifiers, such age,
gender, and postcode.
Configuration Parameters
Parameter
Type
Description
Label
String
Required. Enter the name of the data mask operator.
Minimum rows in an anonymized group
Integer
Required. A numeric string. The larger the number you enter (between 2 and 100), the
less likely the data can be re-identified and a smaller number
of records are output. The lower the number entered, the more
likely the data can be re-identified and a larger number of
records are output. Groups with fewer records than the number
you specify are not output.
Date Format
String
Required. Specifies the order in which month, day, and year
elements appear in the input string. This value is used only
when the day, month, or year in the input string is
ambiguous.
Month Format
String
Required. Specifies the format in which the randomized month
is output when the software cannot determine the output month
format based on the input alone.
Language
String
Required. Specifies the language that the software should use
when determining the output of an ambiguous input month string.
Century Threshold
Integer
Optional. Indicates whether a two-digit date is considered
part of the 20th or 21st century. Enter a value from 0-99. For
example, when set to 25, the dates with a 2-digit value from
00-25 result in the years 2000-2025. Dates with a 2-digit value
of 26-99 result in the years 1926-1999.
Default Column Behavior
String
Required. Define whether to output any columns that are not
defined in the Column
Definitions.
Column Definitions
You can define the Anonymization operation on one or more
columns. Each column has its own definition. Click the
Open Editor icon, and then click
+Add item and complete the following
options:
Column ID (string): Required. This string uniquely
identifies the column. It should match the ID or name of
the column coming into the operator.
Column Designation (string): Required. Specifies the categorization and any masking or
generalization of this column:
Sensitive: Data is output
without modification, for example, height.
Nonsensitive: Data is
output without modification, for example, hair
color.
Quasi-identifier: Data is
output and used to form equivalence classes, for
example, age. You can further mask or generalized
the quasi-identifier columns.
Mask: Mask all or a
portion of the data with another character. For
example, a Social Security number might output as
***-**-1234.
Date Generalization:
Output date ranges into groups. For example,
divide subscribers into groups based on their
birth dates, and label the era (such as
Millennials, GenX, Baby Boomers, and so on) rather
than using the actual birth date.
Numeric Generalization:
Output number ranges into groups. For example,
output the records in a SALARY column that have
values between $42,000 and $125,000 into a group
called Middle Class.
Do Not Modify: Output
data without masking or generalization.
Identifier: Data can
positively identify an individual and is not
output, for example Social Security number.
Mask Options
Mask all or a portion of the data with another character. For example, a credit card
number might output as ****-****-****-1234.
Parameter
Type
Description
Starting Position
String
Required. Specifies whether masking should start at the
beginning or end of the value.
Unmasked Length
String
Required. Specifies the number of characters at the beginning
or end of the value that should not be masked.
Masking Character
String
Required. The character or number that replaces the
characters in the input data, for example, "#" or "*".
Maintain Formatting
String
Required.
True: retains any special
characters such as dashes, slashes or periods, spaces
between characters, and formatting in the output. For
example, if you have a phone number that uses dashes,
then the dashes are output.
False: replaces special
characters and spaces with the designated masking
character.
Date Generalization Options
Output date ranges into groups.
Parameter
Type
Description
Auto Range Scale
String
Required. Defines the scale on which to base the auto range.
Not in Use: Indicates that you are not using auto range
for the specified input column. This setting is
appropriate when you complete the Range Definition
options for the input column, or when you do not use
this feature. Click + Add item to
further define the option.
Calendar Year: Group records based on the calendar year.
The software defines a calendar year as 1/1/yyyy to
12/31/yyyy.
Calendar Month: Group records based on the calendar
month. The software defines a calendar month as
mm/01/yyyy to mm/eom/yyyy, where "eom" is end of
month.
Minimum Date
String
Enter the lowest acceptable date in the range.
Minimum Date Inclusive
String
Required. Select True when you want to include the minimum
date. Select False when you do not want to include the minimum
date in the results. For example, if you set the minimum value
to 12/31/2020, then 12/31/2020 is included in the results when
True is selected.
Maximum Date
String
Enter the highest acceptable date in the range.
Maximum Date Inclusive
String
Required. Select True when you want to include the maximum date. Select False when
you do not want to include the maximum date in the results. For
example, if you set the maximum date to 06/30/2020, then dates
through 06/29/2020 are included in the results when False is
selected.
Replacement Value
String
Required. Enter a value to describe the group.
Default Replacement Value
String
Optional. Value to output when the input value does not fall
into any of the defined ranges.
Auto Range Duration
Integer
Required. Number of years or months to include in the range.
Auto Range Start Date
String
Required. Starting date in auto range.
Auto Range End Date
String
Required. Ending date in auto range.
Auto Range Output Format
String
Required. Determines the format of the output Auto Range Replacement Value.
Auto Range Year Format
String
Required. Specifies the number of digits to use for the year.
Full Year outputs a four-digit number, for example, 2018. Short
Year outputs a two-digit number, for example, 18.
Auto Range Month Format
String
Required. Determines the month format to use in the Auto Range Replacement Value.
Full Text outputs the month name, for example, January. Short
Text outputs the abbreviated month name, for example, Jan.
Numeric outputs the number of the month, for example, 1 for
January.
Auto Range Date Delimiter
String
Required. Determines the delimiter to use in the Auto Range
Replacement Value.
Auto Range Numeric Format
String
Optional. Determines the numeric format to use in the Auto
Range Replacement Value.
Auto Range Enable Zero Pad
String
Optional. Pad a one-digit number with zero when the format includes the month and
day. For example, 1/5/2018 changes to 01/05/2018 when set to
True.
Auto Range Output Language
String
Optional. Determines the language to use in the Auto Range
Replacement Value. This setting is applicable when the Month
Format is set to Short Text or Full Text.
Numeric Generalization Options
Output numbers ranges into groups. For example, output the records in an
AGE column that have values between 13-19 into
a group called Teenager. Specify the ranges to use for numeric
variance. In the Numeric Generalization option, select + Add
item.
Parameter
Type
Description
Minimum Value
Integer
Enter the lowest acceptable value in the range.
Minimum Value Inclusive
String
Required. Select True when you want to include the minimum
value. Select False when you do not want to include the minimum
value in the results. For example, if you set the minimum value
to 30, then 30 is included in the results when True is
selected.
Maximum Value
Integer
Enter the highest acceptable value in the range.
Maximum Value Inclusive
String
Required. Select True when you want to include the maximum value. Select False when
you do not want to include the maximum value in the results. For
example, if you set the maximum value to 50, then numbers
through 49 are included in the results when False is
selected.
Replacement Value
String
Optional. Enter a value to describe the group.
Default Replacement Value
String
Optional. Value to output when the input value does not fall
into any of the defined ranges. For example, if you might want
to label those records as Exceptions.
Numeric Generalization Example
Let's say that you want to assign employees to one of three geographic areas based on
their employee number. You would add three items and complete the options as follows.