Anonymized Groups Example
Use anonymization to place masked data into match groups so that you can publish data without risking re-identification of sensitive data.
Before masking, the data might be unique. After masking several columns of identifying data with Mask, Date Generalization, or Numeric Generalization, there will be duplicate records. In the Anonymization settings, you can choose the minimum number of records you want to output in a group. The groups that contain records less than the number you specify are not output. The larger the number you enter (between 2 and 100), the less likely the data can be re-identified and a smaller number of records are output. The lower the number entered, the more likely the data can be re-identified and a larger number of records are output.
Row ID | Patient ID | Name | Age | Postcode | Date of Visit | Issue |
---|---|---|---|---|---|---|
1 | L1234-0987 | James Smith | 1 | 54601 | 05/13/2017 | Ear infection |
2 | R5678-6543 | Allison Zhou | 26 | 54650 | 02/06/2017 | Influenza |
3 | J2345-9876 | Mia Vang | 53 | 55190 | 12/18/2016 | Heart attack |
4 | J6789-5432 | Ben McCleary | 68 | 55118 | 09/25/2016 | Stroke |
5 | P7789-1212 | Franz Gullikson | 2 | 54603 | 04/15/2017 | Ear infection |
6 | R0606-1223 | Joseph Kaswizki | 4 | 54551 | 01/21/2017 | Influenza |
7 | B7212-7306 | Avijit Farooq | 3 | 54601 | 02/02/2017 | Influenza |
8 | R8675-3099 | Alejandro Rodriquez | 18 | 54650 | 11/27/2016 | Concussion |
9 | J0673-1272 | Aleksandra Kaminski | 2 | 54603 | 04/17/2017 | Ear infection |
10 | W1720-0825 | Amanda Barns | 4 | 54601 | 02/04/2017 | Influenza |
Row ID | Age | Postcode | Date of Visit | Issue |
---|---|---|---|---|
1 | [0-5] | 54*** | Q2 2017 | Ear infection |
2 | [20-29] | 54*** | Q1 2017 | Influenza |
3 | [50-59] | 55*** | Q4 2016 | Heart attack |
4 | [60-69] | 55*** | Q3 2016 | Stroke |
5 | [0-5] | 54*** | Q2 2017 | Ear infection |
6 | [0-5] | 54*** | Q1 2017 | Influenza |
7 | [0-5] | 54*** | Q1 2017 | Influenza |
8 | [15-19] | 54*** | Q4 2016 | Concussion |
9 | [0-5] | 54*** | Q2 2017 | Ear infection |
10 | [0-5] | 54*** | Q1 2017 | Influenza |
Now that the data is masked, you can see how the data is placed into anonymized groups.
Age | Postcode | Date of Visit | Issue | Anonymization Group Size |
---|---|---|---|---|
[0-5] | 54*** | Q1 2017 | Influenza | 3 |
[20-29] | 54*** | Q1 2017 | Influenza | 1 |
[0-5] | 54*** | Q2 2017 | Ear Infection | 3 |
[50-59] | 55*** | Q4 2016 | Heart attack | 1 |
[60-69] | 55*** | Q3 2016 | Stroke | 1 |
[15-19] | 54*** | Q4 2016 | Concussion | 1 |
If you set the Minimum rows in anonymized group option to three, then you could publish six of ten records.
Row ID | Age | Postcode | Date of Visit | Issue |
---|---|---|---|---|
1 | [0-5] | 54*** | Q2 2017 | Ear infection |
5 | [0-5] | 54*** | Q2 2017 | Ear infection |
6 | [0-5] | 54*** | Q1 2017 | Influenza |
7 | [0-5] | 54*** | Q1 2017 | Influenza |
9 | [0-5] | 54*** | Q2 2017 | Ear infection |
10 | [0-5] | 54*** | Q1 2017 | Influenza |
Now, let's say that the issue in the first row is a concussion rather than an ear infection. Would the record still be published? Yes, because the Issue column is not anonymized. The Age, Postcode, and Date of Visit columns are the only anonymized columns. Therefore, only the data in those columns are used in forming anonymized groups, not the data in the Issue column.