Modeling Guide

Data Mask

Protect the personally identifiable or sensitive information by covering all or a portion of the data.

Some examples of personal and sensitive data include credit card numbers, birth dates, tax identification numbers, salary information, medical identification numbers, and bank account numbers. Use data masking to support security and privacy policies, and to protect your customer or employee data from possible theft or exploitation.

Placement in the flowgraph

Place the Data Mask node toward the end of your flowgraph to ensure that all columns that are to be masked have undergone processing by upstream nodes. If you place the Data Mask node before other nodes, the downstream nodes may not process the actual data but rather the masked data, and in some cases, the node won’t be able to process the columns at all if the Data Mask node replaced input data with blanks or a masking character such as “#”.

There are several types of masking available, depending on the content type of the columns that you want to mask.

Masking Type Description
Mask Mask all or a portion of the data with another character. For example, a credit card number might output as ****-****-****-1234.
Date Generalization Output date ranges into groups either automatically or manually. For example, output the records with dates between 01/01/2017-04/30/2017 into a group called "Quarter1".
Date Variance Output randomized dates. For example, change the input date of 01/15/2017 to a random date between 01/01/2017-01/31/2017.
Numeric Generalization Output numbers ranges into groups. For example, output the records in an AGE column that have values between 13-19 into a group called "Teenager".
Numeric Variance Output randomized numbers. For example, change the input salary of 50,000 to a random number between 45,000-55,000.
Pattern Variance Mask an input substring with a specific pattern. For example, using the part number ABC123GHI, mask the first three characters with ZYW, mask the next three characters with 999, and preserve the final three characters as input. The result would be ZYW999GHI.
The following column and data types are supported for masking.
Column Type Data Type Rule Type
Character alphanum, nvarchar, shorttext, and varchar Mask, Date Generalization, Numeric Generalization, Pattern Variance
Date date, seconddate, and timestamp Date Generalization, Date Variance
Numeric bigint, decimal, double, integer, real, smalldecimal, smallint, and tinyint Numeric Generalization, Numeric Variance

To configure the Data Mask node:

  1. Place the Data Mask node onto the canvas. Click the gear icon. The columns available for masking are shown.
  2. In the Data Mask Rule column, click the wrench icon for the column that contains the data you want masked.
  3. Select the type of masking and configure the settings. See the description of options in the separate Mask, Date Generalization, Date Variance, Numeric Generalization, Numeric Variance, and Pattern Variance Type topics.
  4. Click Apply, and then Back to view the entire flowgraph.

To edit or delete masking rules:

  1. Click the wrench icon next to the rule that you want to change or delete.
  2. To change the rule, click Edit Rule. Make the appropriate changes, and then click Apply.

    To delete the rule, click Remove Rule.