Data Mask

Protect the personally identifiable or sensitive information by covering all or a portion of the data.

Some examples of personal and sensitive data include credit card numbers, birth dates, tax identification numbers, salary information, medical identification numbers, and bank account numbers. Use data masking to support security and privacy policies, and to protect your customer or employee data from possible theft or exploitation.

Placement in the flowgraph

Place the Data Mask node toward the end of your flowgraph to ensure that all columns that are to be masked have undergone processing by upstream nodes. If you place the Data Mask node before other nodes, the downstream nodes may not process the actual data but rather the masked data, and in some cases, the node won’t be able to process the columns at all if the Data Mask node replaced input data with blanks or a masking character such as “#”.

There are several types of masking available, depending on the content type of the columns that you want to mask.

Masking Type	Description
Mask	Mask all or a portion of the data with another character. For example, a credit card number might output as **--**-1234.
Date Generalization	Output date ranges into groups either automatically or manually. For example, output the records with dates between 01/01/2017-04/30/2017 into a group called "Quarter1".
Date Variance	Output randomized dates. For example, change the input date of 01/15/2017 to a random date between 01/01/2017-01/31/2017.
Numeric Generalization	Output numbers ranges into groups. For example, output the records in an AGE column that have values between 13-19 into a group called "Teenager".
Numeric Variance	Output randomized numbers. For example, change the input salary of 50,000 to a random number between 45,000-55,000.
Pattern Variance	Mask an input substring with a specific pattern. For example, using the part number ABC123GHI, mask the first three characters with ZYW, mask the next three characters with 999, and preserve the final three characters as input. The result would be ZYW999GHI.

The following column and data types are supported for masking.

Column Type	Data Type	Rule Type
Character	alphanum, nvarchar, shorttext, and varchar	Mask, Date Generalization, Numeric Generalization, Pattern Variance
Date	date, seconddate, and timestamp	Date Generalization, Date Variance
Numeric	bigint, decimal, double, integer, real, smalldecimal, smallint, and tinyint	Numeric Generalization, Numeric Variance

To configure the Data Mask node:

Place the Data Mask node onto the canvas. Click the gear icon. The columns available for masking are shown.
In the Data Mask Rule column, click the wrench icon for the column that contains the data you want masked.
Select the type of masking and configure the settings. See the description of options in the separate Mask, Date Generalization, Date Variance, Numeric Generalization, Numeric Variance, and Pattern Variance Type topics.
Click Apply, and then Back to view the entire flowgraph.

To edit or delete masking rules:

Click the wrench icon next to the rule that you want to change or delete.
To change the rule, click Edit Rule. Make the appropriate changes, and then click Apply.
To delete the rule, click Remove Rule.