Data Mask Examples
See several examples of Mask, Date Generalization, Date Variance, Numeric Generalization, Numeric Variance, and Pattern Variance masking options.
The following examples show how various settings can affect the data.
Mask
- If you have a column for User_ID, the value is Smith7887, and you set the option to Everything except the first 4 characters, your output would be Smitxxxxx.
- If you set the same option to Everything except the last 2 characters, your output would be xxxxxxx87. If your masking character is left blank, your output would be 87.
- If you have a column for Phone1, the value is 800-555-1234, and select Maintain format, your output would be xxx-xxx-xxxx. Not selecting this option would output xxxxxxxxxxxx.
- If you have a column for Email1, the value is john.smith@abc.com, and you enable Maintain format and Mask all characters in email name, then your output would be xxxxxxxxxx@xxx.xxx.
- If you enable Maintain format and disable Mask all characters in email name, then your output would be xxxx.xxxxx@xxx.xxx.
- If you disable both Maintain format and Mask all characters in email name, then your output would be xxxxxxxxxxxxxxxxxx.
Date Generalization
Let's say that you want to divide subscribers into groups based on their birth dates, and that you want to label the era rather than using the actual birth date. Because the age groups are not an equal number of years, it would be best to manually define each range one at a time rather than using the auto range. The default masked value is defined as Out of Range, so that any records that do not belong in the defined ranges will output Out of Range as the masked value.
Minimum | Column Name | Maximum | Masked Value |
---|---|---|---|
1940.01.01<= | BIRTHDATE | <=1964.12.31 | Baby Boomer |
1965.01.01<= | BIRTHDATE | <=1976.12.31 | Gen X |
1977.01.01<= | BIRTHDATE | <=1995.12.31 | Millennial |
Date Variance
The following table shows several examples for one date value in a database: May 27, 1995. The examples help to illustrate how the date is calculated internally, and how it is output when the user-defined minimum and maximum values are used together.
Variance Type | Internally calculated date | Output value within this range | Notes |
---|---|---|---|
Fixed Days Variance: 10 Minimum date: <not set> Maximum date: <not set> |
Min date: May 17, 1995 Max date: Jun 6, 1995 |
Min date: May 17, 1995 Max date: Jun 6, 1995 |
Max date: Jun 6, 1995 |
Fixed Days Variance: 100 Minimum date: Mar 1, 1995 Maximum date: Aug 31, 1995 |
Min date: Feb 9, 1995 Max date: Sep 4, 1995 |
Min date: Mar 1, 1995 Max date: Aug 31, 1995 |
The user-defined minimum and maximum dates are within the calculated minimum and maximum dates. Therefore, the user-defined dates are used. |
Fixed Months Variance: 6 Min date: Jan 1, 1995 Max date: Dec 31, 1995 |
Min date: Nov 27, 1994 Max date: Nov 27, 1995 |
Min date: Jan 1, 1995 Max date: Nov 27, 1995 |
The user-defined minimum date is within the calculated variance, and is the value used for output. The maximum user-defined date is outside of the calculated variance. Therefore, the maximum calculated date is used. |
Fixed Years Variance: 15 Min date: Jan 1, 1965 Max date: Dec 31, 2015 |
Min date: May 27, 1980 Max date: May 27, 2010 |
Min date: May 27, 1980 Max date: May 27, 2010 |
Both of the user-defined minimum and maximum dates are outside of the calculated variance. Therefore the calculated date is used. |
Range Min date: Jan 1, 1965 Max date: Dec 31, 2015 |
n/a |
Min date: Jan 1, 1965 Max date: Dec 31, 2015 |
Because there is no variance for range, only the user-defined minimum and maximum values are used. |
Numeric Generalization
Let's say that you want to categorize your employee numbers by location. Rather than outputting the employee number, the defined masked values replace the value in the EMPNO column. Any numbers that do not fall into the range have the default masked value, Outliers.
Minimum | Column Name | Maximum | Masked Value |
---|---|---|---|
12000<= | EMPNO | <=16999 | Asia_Pac |
17000<= | EMPNO | <=29999 | Europe_Africa |
23000< | EMPNO | <=28999 | Americas |
Numeric Variance
The following table shows several examples for one salary value in a database: $50,000. The examples help to illustrate how the value is calculated internally, and how it is output when the user-defined minimum and maximum values are used together.
Variance Type | Internally calculated value | Output within this range | Notes |
---|---|---|---|
Percentage Variance: 25 Minimum value: <not set> Maximum value: <not set> |
Min value: $37,500 Max value: $62,500 |
Min value: $37,500 Max value: $62,500 |
The output uses the internally calculated values, because the user-defined values are not specified. |
Percentage Variance: 25 Minimum value: 45,000 Maximum value: 85,000 |
Min value: $37,500 Max value: $62,500 |
Min value: $45, 00 Max value: $62,500 |
The user-defined minimum value is within the calculated variance, and is the value used for output. The maximum user-defined date is outside of the calculated variance. Therefore, the maximum calculated value is used. |
Fixed number Variance: 2500 Minimum value: <not set> Maximum value: <not set> |
Min value: $47,500 Max value: $52,500 |
Max value: $52,500 Max value: $52,500 |
The output uses the internally calculated values, because the user-defined values are not specified. |
Fixed number Variance: 2500 Minimum value: 45,000 Maximum value: 85,000 |
Min value: $47,500 Max value: $52,500 |
Min value: $47,500 Max value: $52,500 |
Both the user-defined minimum and maximum values are outside of the calculated variance. Therefore, the calculated minimum and maximum values are used. |
Range Minimum value: 55,000 Maximum value: 95,000 |
n/a |
Minimum value: 55,000 Maximum value: 95,000 |
Because there is no variance for range, only the user-defined minimum and maximum values are used. |
Pattern Variance
-
Default variance type: The following table describes how the default pattern variance masks input characters with like characters.
Input character type Mask values Alphabetic Masks lower-case alpha character with random lower-case alpha character
Masks upper-case alpha character with random upper-case alpha character
Numeric Masks each numeric character with a random numeric character from 0 up to and including 9. Special characters or spaces Does not mask special characters or spaces, but outputs them as they are input, unmasked. For example, when the input substring contains a dash (-), the default pattern variance keeps the dash in the output. When the input substring contains a space, the default pattern variance keeps the space in the output. The following table shows the best practice use of default pattern variance.
Description Best Practice Example Mask an entire input field using the default pattern variance. Add a substring definition with a Default variance type. Starting position: 1
Ending position: <slider all the way to the right>
Automatically apply the default pattern variance to substrings of a mapped input column that are not defined. Define input field substrings using one or more of the other pattern variance types, and leave portions of the input value undefined. Definition 1:
Definition type: Preserve
Starting position: 1
Ending position: 3
Definition 2:
Definition type: String
Starting position: 4
Ending position: 5
Value range min-max: 20-25
Undefined range: 6-end of the value
Results
Definition 1: Preserves the characters in positions 1-3
Definition 2: Masks the entire substring (position 4-5) with a number that is included in the minimum/maximum range (in this case 20-25)
Undefined: Masks position six to the end of the field with the default pattern variance.
-
Preserve variance type: The application outputs the defined substring as it was input, with no masking.
The following table contains an example of the preserve pattern variance.
Strategy Settings Example input/output Notes Preserve the unit identification number in each record. Mask the rest of the value with the default pattern variance. Undefined: character in position 1
Definition:
Definition type: Preserve
Starting position: 2
Ending position: 3
Value: <blank>
Undefined: characters 4 through the end of the value
Input value: B12:G350
Possible output values, preserved characters in bold.
A12:N799
F12:M127
Undefined: Masks the first position with a like character using the default pattern variance.
Definition: Preserves position two and three with the preserve pattern variance. (In this case the numbers 12).
Undefined: masks the fourth position to the end of the string using the default pattern variance.
The colon in the input field (character four) is included in the undefined portion. The software outputs the colon as it was input based on the default pattern variance definition.
-
Character variance type: The application masks each character in the defined substring with a character from the Value option.
The Value option can include individual upper or lower case alpha characters, numeric characters from 0 to 9, ranges of alpha characters, or ranges of numeric characters (using numbers from 0 to 9), spaces, special characters (such as @, #, _, &), or any combination of these.Special characters are output as they are input, without masking them, when they are present in a defined substring for character pattern variance.
The following table contains an example of the character pattern variance.
Strategy Settings Example input/output Notes Mask an identification code with specific alpha or numeric values, and apply the default pattern variance to the remaining portion of the value. Undefined: character in position 1
Definition:
Definition type: Character
Starting position: 2
Ending position: 3
Value: J-L|B|W-Y|2
Undefined: characters 4 through the end of the value
Input value: 123a
Possible output values, masked characters in bold.
8KBx
32Wt
Undefined: Masks the first position with a like character using the default pattern variance.
Definition: Masks position two and three using the character pattern variance and randomly chooses a character specified in the Value field to mask each position.
Undefined: Masks the fourth position to the end of the string using the default pattern variance.
-
String variance type:The application masks the entire defined substring with a random character or string from the Value option.
The Value option can include one or more alpha or numeric characters (such as MILK or 2458), spaces, special characters (such as @, #, _, &), numeric ranges, or any combination of these in a list in the Value option.
The application counts all alphanumeric characters, spaces, and special characters when it determines the substring length. However, the application does not retain the special characters or spaces in the output when they are present in a defined substring for string pattern variance.
The following table contains an example of the string pattern variance.
Strategy Settings Example input/output Notes Preserve the product code, but mask the type of milk (white, chocolate, soy, and so on) with the general term MILK. Definition 1:
Definition type: Preserve
Starting position: 1
Ending position: 5
Value: <blank>
Definition 2:
Definition type: String
Starting position: 6
Ending position: <end of column>
Value: MILK
Input value: 5428-WTMLK
5429-SOYMLK
Possible output values, string characters in bold.
5428-MILK
5429-MILK
Definition 1: Preserves the first through the fifth positions, including the dash, as part of preserve pattern variance. The dash is output as it was input because it is included in the preserve pattern variance.
Definition 2: Masks position six to the end of the field with the value MILK.
Include a zero to the left of any number in a range (to the left of the lower or higher number or to both numbers) so the mask value is left-padded with zeros for the length of the substring.Definition 1:
Definition type: String
Starting position: 1
Ending position: 5
Value: 01-8|999
Undefined: position 6 through the end of the column.
Input value: 04-a1099
Possible output values, string characters in bold.
00003502
999832
Definition 1: Outputs the first through the fifth characters with a number from 1 up to and including 8, or the number 999.
When the application chooses a number from the range as a mask value, it zero-pads the number to the left so the masked value is the length of the defined substring (5 characters).
The application does not zero-pad the number 999 because it does not contain a leading zero in the Value option.
The application includes the dash in the input field in the position count for the substring. However, the application does not output the dash as part of the string pattern variance definition.
Undefined: The application applies the default pattern variance to the undefined portion of the input field, character six to the end of the field, and replaces each numeric value with a random value from 0 to 9.
-
Examples of various pattern variance types by showing example definition option settings, input values, and possible output values.
Strategy Settings Example input/outpu Notes Mask the weight and the unit of measure from a product code, but preserve the product type.
Definition 1:
Definition type: Preserve
Starting position: 1
Ending position: 3
Value: <blank>
Undefined: Position 4 and 5.
Definition 2:
Definition type: String
Starting position: 6
Ending position: 9
Value: GAL|qt|pt|oz|CUP
Input value: MLK12CUP
Possible output values:
MLK63GAL
MLK18pt
MLK04oz
Definition 1: Preserves the first three positions using the preserve pattern variance.
Undefined: Masks the fourth and fifth positions using the default pattern variance, which replaces each numeric character with a value from 0-9.
Definition 2: Masks the sixth through the eighth positions with one of the values listed using the string pattern variance.
Notice that in some cases, the software replaces a 3-character substring with a 2-character value.
Mask the product type and the weight from a product code, but preserve the unit of measure.
Definition 1:
Definition type: String
Starting position: 1
Ending position: 3
Value: ALMLK| SOYMLK| RCEMLK| WTMLK| CHMLK
Definition 2:
Definition type: String
Starting position: 4
Ending position: 5
Value: 01-12| 32| 16
Definition 3:
Definition type: Preserve
Starting position: 6
Ending position: <end of column>
Value: <blank>
Input value: MLK12CUP
Possible output values:
WTMLK32CUP
RCEMLK16CUP
ALMLK08CUP
Definition 1: Masks the first three positions using one of the values specified for string pattern variance.
The application masks the 3-character substring with values that may be longer than 3 characters.
Definition 2: Masks the fourth and fifth positions using the string pattern variance.
The first value listed in the Value option for Definition 2 is a range beginning with a zero-padded number. This ensures that the mask value is the length of the defined substring, 2 characters.
Definition 3: Preserves the sixth through the eighth positions.
Mask the number of paper sheets per package, and the type of packaging from the product description column.
Definition 1:
Definition type: Character
Starting position: 1
Ending position: 4
Value: 0-9
Undefined: Position 5.
Definition 2:
Definition type: String
Starting position: 6
Ending position: <end of column>
Value: Ream| Case| Pack| Box
Input value: 1500/Ream
Possible output values:
0950/Case
8945/Box
2639/Pack
Definition 1: Masks the first through the fourth positions with a number from 0-9.
The user could leave the first through the fifth position undefined so the application masks the substring using the default pattern variance to get similar output values. The forward slash would be output as part of the substring in this case.
Undefined: Outputs the forward slash (/) character in the fifth position using the default pattern variance (maintains special characters on output).
Definition 2: Mask the sixth position to the end of the column with one of the character strings listed in the Value option.
Mask the school district, the state, and the enrollment number. Preserve the type of school.
Definition 1:
Definition type: String
Starting position: 1
Ending position: 3
Value: DST
Definition 2:
Definition type: String
Starting position: 4
Ending position: 5
Value: ST
Undefined: Position 6, 7, 8, and 9.
Definition 3:
Definition type: Preserve
Starting position: 10
Ending position: <end of column>
Value: <blank>
Input value: INDNE7321MID
ANMA7321HIGH
SNBCA7321ELEM
Possible output values:
DSTST3829MID
DSTST5784HIGH
DSTST0789ELEMDefinition 1: Masks position one to three with the string DST.
Definition 2: Mask the fourth and the fifth position with the string ST.
Undefined: Masks position six through nine with the default pattern variance.
Definition 3: Preserves the tenth position to the end of the column.