Sample

Properties that can be configured for the Sample Preparation Component.

Syntax Use this component to select a subset of data from large datasets.
The Sample component supports the following sample types:
  • First N: Selects the first N records in the dataset.
  • Last N: Selects the last N records in the dataset.
  • Every Nth: Selects every Nth record in the dataset, where N is an interval. For example, if N=2, the 2nd, 4th, 6th, and 8th records are selected and so on.
  • Simple Random: Randomly selects records of size N or N percent of records in a dataset.
  • Systematic Random: In this sample type, sample intervals or buckets are created based on the bucket size. The Sample component selects the Nth record at random from the first bucket, and from each subsequent bucket the Nth record is selected.
Sample Properties
Table 1: Data Preparation Component Properties
Property Description
Sampling Type Select the type of sampling.
Limit Rows by Select the method for limiting the rows.
Number of Rows Enter the number of rows you want to select.
Percentage of Rows Enter the percentage of rows you want to select.
Bucket Size Enter the bucket size within which you want to select a random row.
Step Size Enter the interval between the rows you want to select.
Maximum Rows Enter the maximum number of rows you want to select.

Example:

Selecting subset of data from a given dataset
Emp ID Emp Name DOB Age
1 Laura 11/11/1986 25
2 Desy 12/5/1981 30
3 Alex 30/5/1978 33
4 John 6/6/1979 32
5 Ted 4/7/1987 24
6 Tom 30/6/1970 41
7 Anna 24/6/1965 46
8 Valerie 6/7/1990 21
9 Mary 19/9/1985 26
10 Martin 21/11/1986 25
Sample outputs:
  1. First N: For N=5
    Emp ID Emp Name DOB Age
    1 Laura 11/11/1986 25
    2 Desy 12/5/1981 30
    3 Alex 30/5/1978 33
    4 John 6/6/1979 32
    5 Ted 4/7/1987 24
  2. Last N: For N=4
    Emp ID Emp Name DOB Age
    7 Anna 24/6/1965 46
    8 Valerie 6/7/1990 21
    9 Mary 19/9/1985 26
    10 Martin 21/11/1986 25
  3. Every Nth: Interval=3
    Emp ID Emp Name DOB Age
    3 Alex 30/5/1978 33
    6 Tom 30/6/1970 41
    9 Mary 19/9/1985 26
  4. Simple Random: For number of rows=2

    The result can be any two rows.

    Emp ID Emp Name DOB Age
    7 Anna 24/6/1965 46
    8 Valerie 6/7/1990 21
  5. Systematic Random: Bucket Size=4
    Emp ID Emp Name DOB Age
    2 Desy 12/5/1981 30
    6 Tom 30/6/1970 41
    10 Martin 21/11/1986 25

    or

    Emp ID Emp Name DOB Age
    1 Laura 11/11/1986 25
    5 Ted 4/7/1987 24
    9 Mary 19/9/1985 26