Modeling Guide

Seed Value

Set the Seed value to maintain referential integrity.

When you want to maintain referential integrity, set the Seed option. This will still mask the data, but in a way that ensures consistent values each time the data is output. Let's say that you are masking the Customer_ID value, and want to ensure each ID is randomized on output. You can use any combination of numbers and characters to create an identifiable value such as Region9_Cust. This value is not output; it just ensures that the output data is consistent each time the flowgraph is run. For example, let's say that you are running a Numeric Variance with a Fixed Number and have set the Variance option to 5.

Input data Valid output range
2550 2545-2555
3000 2995-3005
5500 4595-5505
After the first run, let's say the output data is:
Output data after initial processing
2552
3001
5505
With the seed value set, the subsequent processing keeps the same output for each record. Whereas without the seed value set, the output continues to be randomized.
Output after the second run with the seed value set Output after the second run without the seed value
2552 2554
3001 2998
5505 5497

Example

Retain referential integrity using a seed value to keep the altered values the same when you run a job multiple times.

Date variance seed example: If you randomize the input value "June 10, 2016" by 5 days, the output will be a date between "June 5, 2016" and "June 15, 2016". If the output for the first run is "June 9, 2016", using the seed value will output the value "June 9, 2016" on all subsequent runs, so that you can be certain the data is consistent. Not using the seed value might return a value of "June 11, 2016 on the next run, and "June 7, 2016" on the following run.

Numeric variance seed example: If you randomize the input value "500" with a fixed value of 5, the output will be a number between 495-505. If the output for the first run is "499", using the seed value will output the value "499" in all subsequent runs, so that you can be certain the data is consistent. Not using the seed value might return a value of "503" on the next run, and "498" on the following run.