Properties that can be configured for the Sample Preparation Component.
Syntax Use this component to select a
subset of data from large datasets.
The Sample component supports the following sample
types:
- First
N: Selects the first N records in the dataset.
- Last
N: Selects the last N records in the dataset.
- Every Nth: Selects every Nth record in the dataset, where N is an interval. For example, if N=2,
the 2nd, 4th, 6th, and 8th records are selected and so on.
- Simple
Random: Randomly selects records of size N or N percent of records in a
dataset.
- Systematic Random: In this sample
type, sample intervals or buckets are created based on the bucket size. The
Sample component selects the Nth record at random from the first bucket, and
from each subsequent bucket the Nth record is selected.
Sample Properties
Table 1:
Data Preparation Component Properties
| Property |
Description |
| Sampling Type |
Select the type of sampling. |
| Limit Rows by |
Select the method for limiting the rows. |
| Number of Rows |
Enter the number of rows you want to select. |
| Percentage of Rows |
Enter the percentage of rows you want to select. |
| Bucket Size |
Enter the bucket size within which you want to select a random
row. |
| Step Size |
Enter the interval between the rows you want to select. |
| Maximum Rows |
Enter the maximum number of rows you want to select. |
Example:
Selecting subset of data from a given
dataset
| Emp
ID |
Emp Name |
DOB |
Age |
| 1 |
Laura
|
11/11/1986 |
25 |
| 2 |
Desy
|
12/5/1981 |
30 |
| 3 |
Alex
|
30/5/1978 |
33 |
| 4 |
John
|
6/6/1979 |
32 |
| 5 |
Ted
|
4/7/1987 |
24 |
| 6 |
Tom
|
30/6/1970 |
41 |
| 7 |
Anna
|
24/6/1965 |
46 |
| 8 |
Valerie
|
6/7/1990 |
21 |
| 9 |
Mary |
19/9/1985 |
26 |
| 10 |
Martin |
21/11/1986 |
25 |
Sample outputs:
- First
N: For N=5
| Emp ID
|
Emp
Name |
DOB |
Age |
| 1 |
Laura |
11/11/1986 |
25 |
| 2 |
Desy |
12/5/1981 |
30 |
| 3 |
Alex |
30/5/1978 |
33 |
| 4 |
John |
6/6/1979 |
32 |
| 5 |
Ted |
4/7/1987 |
24 |
- Last
N: For N=4
| Emp ID |
Emp
Name |
DOB |
Age |
| 7 |
Anna |
24/6/1965 |
46 |
| 8 |
Valerie |
6/7/1990 |
21 |
| 9 |
Mary |
19/9/1985 |
26 |
| 10 |
Martin |
21/11/1986 |
25 |
- Every Nth: Interval=3
| Emp ID |
Emp Name |
DOB |
Age |
| 3 |
Alex |
30/5/1978 |
33 |
| 6 |
Tom |
30/6/1970 |
41 |
| 9 |
Mary |
19/9/1985 |
26 |
- Simple
Random: For number of rows=2
The result can be any two rows.
| Emp ID |
Emp
Name |
DOB |
Age |
| 7 |
Anna |
24/6/1965 |
46 |
| 8 |
Valerie |
6/7/1990 |
21 |
- Systematic Random: Bucket Size=4
| Emp ID |
Emp
Name |
DOB |
Age |
| 2 |
Desy |
12/5/1981 |
30 |
| 6 |
Tom |
30/6/1970 |
41 |
| 10 |
Martin |
21/11/1986 |
25 |
or
| Emp ID |
Emp
Name |
DOB |
Age |
| 1 |
Laura |
11/11/1986 |
25 |
| 5 |
Ted |
4/7/1987 |
24 |
| 9 |
Mary
|
19/9/1985 |
26 |