Modeling Guide

Data Mask

Protect the personally identifiable or sensitive information by covering all or a portion of the data.

Some examples of personal and sensitive data include credit card numbers, birth dates, tax identification numbers, salary information, medical identification numbers, bank account numbers, and so on. Use data masking to support security and privacy policies, and to protect your customer or employee data from possible theft or exploitation.

Place the Data Mask operator toward the end of your graph to ensure that all columns that are to be masked have undergone processing by upstream operators. If you place Data Mask before other operators, the downstream operators may not process the actual data but rather the masked data, and in some cases, the operator won’t be able to process the columns at all if Data Mask replaced input data with blanks or a masking character such as “#”.

Configuration Parameters

Parameter Type Description
Label String Mandatory. Enter the name of the data mask operator.
Seed String Optional. An alpha and/or numeric string. Set this option to mask the data in a way that ensures consistent output values each time the data is output. One seed value maintains referential integrity for the following variance types set up in the Data Mask transform: Number Variance, Date Variance, and Pattern Variance.
Date Format String Required. Specifies the order in which month, day, and year elements appear in the input string. This value is used only when the day, month, or year in the input string is ambiguous.
Month Format String Required. Specifies the format in which the randomized month is output when the software cannot determine the output month format based on the input alone.
Language String Required. Specifies the language that the software should use when determining the output of an ambiguous input month string.
Century Threshold String Optional. Indicates whether a two-digit date is considered part of the 20th or 21st century. Enter a value from 0-99. For example, when set to 25, the dates with a 2-digit value from 00-25 result in the years 2000-2025. Dates with a 2-digit value of 26-99 result in the years 1926-1999.
Column Definitions   You can define the mask operation on one or more columns. Each column has its own definition. Click the Open Editor icon, and then click +Add item and complete the following options:
  1. Column ID (string): Required. This string uniquely identifies the column. It should match the ID or name of the column coming into the operator.
  2. Operation (string): Required. Specifies the type of masking operation for this column: Mask, Pattern Variance, Numeric Variance, Numeric Generalization, Date Variance, or Date Generalization.

Mask Options

Mask all or a portion of the data with another character. For example, a credit card number might output as ****-****-****-1234.
Parameter Type Description
Starting Position String Required. Specifies whether masking should start at the beginning or end of the value.
Unmasked Length String Required. Specifies the number of characters at the beginning or end of the value that should not be masked.
Masking Character String Required. The character or number that replaces the characters in the input data, for example, "#" or "*".
Maintain Formatting String Required.
  • True: retains any special characters such as dashes, slashes or periods, spaces between characters, and formatting in the output. For example, if you have a phone number that uses dashes, then the dashes are output.
  • False: replaces special characters and spaces with the designated masking character.

Pattern Variance Options

Mask an input substring with a specific pattern. For example, using the part number ABC123GHI, mask the first three characters with ZYW, mask the next three characters with 999, and preserve the final three characters as input. The result would be ZYW999GHI. In the Pattern Variance Definition option, select + Add item.
Parameter Type Description
Variance type String Required. Choose one of the options.
  • Default: Masks each applicable character with the like characters for alpha and numeric content. Retains spaces and special characters.
  • Preserve: Outputs the defined substring the same as it is input.
  • Character: Masks the defined substring by randomly placing each of the characters in the defined substring with the values that you specify in the Value setting. Retains spaces and special characters.
  • String: Masks the defined substring by randomly replacing the entire substring with values that you specify in the Value option. Does not retain spaces or special characters.
Starting Position Integer Required. A positive integer that indicates the character number where masking should start. Alpha, numeric, space, and other printable characters are included in the position count.
Length Integer Required. A positive integer that indicates the number of positions (characters) to mask.
Value String Required. When parsing the string for a value, it trims leading and trailing spaces from the value. You can maintain the spaces by surrounding the value in framing characters such as double quotes. For example, STRING, "John,Smith". If the value has double quotes, you can escape the double quotes using a backslash. For example, "\"Slim\"". If the value with framing characters has a backslash in it, the backslash can be escaped with an additional backslash. For example, "\\path".

Numeric Variance Options

Output randomized numbers. For example, change the input salary of 50,000 to a random number between 45,000-55,000.
Parameter Type Description
Numeric Variance Type String Required. Define how you want to vary a number.
  • Percentage: Varies the data by a percentage that is within a calculated minimum and maximum range.
  • Fixed Number: Varies the data by a fixed number that is within a calculated minimum and maximum range.
  • Range: Varies the data that is greater than or equal to the user-defined minimum value and less than or equal to the user-defined maximum values that you set. You must set the minimum and maximum values.
Numeric Variance Number Required. Determines the number by which to randomize the input. Enter a value greater than zero.
Minimum Value Number Required. Determines the number by which to randomize the input. Enter the lowest value that can be output as a whole number or decimal. Negative decimal numbers are supported. For best results, set a realistic maximum value.
Maximum Value Number Required. Enter the highest value that can be output as a whole number or decimal. Negative decimal numbers are supported. For best results, set a realistic maximum value.

Numeric Generalization Options

Output numbers ranges into groups. For example, output the records in an AGE column that have values between 13-19 into a group called Teenager. Specify the ranges to use for numeric variance. In the Numeric Generalization option, select + Add item.
Parameter Type Description
Minimum Value Integer Enter the lowest acceptable value in the range.
Minimum Value Inclusive String Required. Select True when you want to include the minimum value. Select False when you do not want to include the minimum value in the results. For example, if you set the minimum value to 30, then 30 is included in the results when True is selected.
Maximum Value Integer Enter the highest acceptable value in the range.
Maximum Value Inclusive String Required. Select True when you want to include the maximum value. Select False when you do not want to include the minimum value in the results. For example, if you set the maximum value to 50, then numbers through 49 are included in the results when False is selected.
Replacement Value String Optional. Enter a value to describe the group.
Default Replacement Value String Optional. Value to output when the input value does not fall into any of the defined ranges. For example, if you might want to label those records as Exceptions.

Numeric Generalization Example

Let's say that you want to assign employees to one of three geographic areas based on their employee number. You would add three items and complete the options as follows.

Date Variance Options

Output randomized dates. For example, change the input date of 01/15/2017 to a random date between 01/01/2017-01/31/2017.
Parameter Type Description
Date Variance Type String Specifies how you want to vary a date.
  • Range: Varies the date within the user-defined minimum and maximum dates that you set. You must set the Minimum and Maximum date values.
  • Fixed Days: Varies the date by a fixed number of days that occur before or after the input date.
  • Fixed Months: Varies the date by a fixed number of months that occur before or after the input month.
  • Fixed Years: Varies the date by a fixed number of years that occur before or after the input year.
Date Variance Number Required. Determines the number of days, months, or years by which to randomize the input. The value must be greater than zero.
Minimum Date String Required for Range; optional for other types. Specify the minimum date allowed on output.
Maximum Date String Required for Range; optional for other types. Specify the maximum date allowed on output.

Date Generalization Options

Output date ranges into groups.
Parameter Type Description
Auto Range Scale String Required. Defines the scale on which to base the auto range.
  • Not in Use: Indicates that you are not using auto range for the specified input column. This setting is appropriate when you complete the Range Definition options for the input column, or when you do not use this feature. Click + Add item to further define the option.
  • Calendar Year: Group records based on the calendar year. The software defines a calendar year as 1/1/yyyy to 12/31/yyyy.
  • Calendar Month: Group records based on the calendar month. The software defines a calendar month as mm/01/yyyy to mm/eom/yyyy, where "eom" is end of month.
Minimum Date String Enter the lowest acceptable date in the range.
Minimum Date Inclusive String Required. Select True when you want to include the minimum date. Select False when you do not want to include the minimum date in the results. For example, if you set the minimum value to 12/31/2020, then 12/31/2020 is included in the results when True is selected.
Maximum Date String Enter the highest acceptable date in the range.
Maximum Date Inclusive String Required. Select True when you want to include the maximum date. Select False when you do not want to include the minimum date in the results. For example, if you set the maximum date to 06/30/2020, then dates through 06/29/2020 are included in the results when False is selected.
Replacement Value String Required. Enter a value to describe the group.
Default Replacement Value String Optional. Value to output when the input value does not fall into any of the defined ranges.
Auto Range Duration Integer Required. Number of years or months to include in the range.
Auto Range Start Date String Required. Starting date in auto range.
Auto Range End Date String Required. Ending date in auto range.
Auto Range Output Format String Required. Determines the format of the output Auto Range Replacement Value.
Auto Range Year Format String Required. Specifies the number of digits to use for the year. Full Year outputs a four-digit number, for example, 2018. Short Year outputs a two-digit number, for example, 18.
Auto Range Month Format String Required. Determines the month format to use in the Auto Range Replacement Value. Full Text outputs the month name, for example, January. Short Text outputs the abbreviated month name, for example, Jan. Numeric outputs the number of the month, for example, 1 for January.
Auto Range Date Delimiter String Required. Determines the delimiter to use in the Auto Range Replacement Value.
Auto Range Numeric Format String Optional. Determines the numeric format to use in the Auto Range Replacement Value.
Auto Range Enable Zero Pad String Optional. Pad a one-digit number with zero when the format includes the month and day. For example, 1/5/2018 changes to 01/05/2018 when set to True.
Auto Range Output Language String Optional. Determines the language to use in the Auto Range Replacement Value. This setting is applicable when the Month Format is set to Short Text or Full Text.

Input

Input Type Description
Input Message The input is expected to be in JSON format.

Output

Output Type Description
Output BLOB The output is in JSON format.