Modeling Guide for SAP Data Hub

Working with Flowagent Subengine to Connect to Databases

Partitioning the source data allows us to load data in chunks there by overcome memory pressure and by doing it in parallel we can load data faster and improve overall performance.

Partitioning

Below are supported partitioning methods with the 4055624226f14da5923278c36ec23a0e.html operators.

Logical Partition

Logical Partitions are user-defined partitions on how to read data from source. For Table Consumer, the user can choose the partition type then add the partition specification as described below.

  • defaultPartition: The partition which will fetch all rows which were not filtered by the other partitions (you can disable it by setting defaultPartition to false);

  • conditions: Each condition represents a set of filters which will be performed on a certain column;
    • columnExpression: The column where the filters will be applied. It can be either the column name or a function on it (i.e., TRUNC(EMPLOYEE_NAME));

    • type: The filter type. It can be either "LIST" (to filter exact values) or "RANGE" (to filter a range of values);

    • dataType: The data type of the elements. It can be either "STRING", "NUMBER" or "EXPRESSION";

    • elements: Each element represents a different filter, and its semantics depends on the "type" (see above).

  • When more than one condition is presented, it will be performed a cartesian product among the elements of each condition, i.e.:

    • Condition A: [C1, C2]

    • Condition B: [C3, C4]

    • Resulting filters: [(C1, C3), (C1, C4), (C2, C3), (C2, C4)]

Ex: Let's assume we have a source table called EMPLOYEE which has the following schema:
CREATE TABLE EMPLOYEE(
    EMPLOYEE_ID NUMBER(5) PRIMARY KEY,
    EMPLOYEE_NAME VARCHAR(32),
    EMPLOYEE_ADMISSION_DATE DATE
);

RANGE

A JSON representing a RANGE partition is shown below, where the elements represent ranges, thus, they need to be sorted in ascending order.

Numeric Partitions

{
  "defaultPartition": true,
  "conditions": [
    {
      "columnExpression": "EMPLOYEE_ID",
      "type": "RANGE",
      "dataType": "NUMBER",
      "elements": [
        "10",
        "20",
        "30"
      ]
    }
  ]
}

String Partitions

{
  "defaultPartition": true,
  "conditions": [
    {
      "columnExpression": "EMPLOYEE_NAME",
      "type": "RANGE",
      "dataType": "STRING",
      "elements": [
        "M",
        "T"
      ]
    }
  ]
}

Expression Partitions

{
  "defaultPartition": true, 
  "conditions": [
    {
      "columnExpression": "EMPLOYEE_ADMISSION_DATE",
      "type": "RANGE",
      "dataType": "EXPRESSION",
      "elements": [
        "TO_DATE('2012-01-01', 'YYYY-MM-DD')",
        "TO_DATE('2015-01-01', 'YYYY-MM-DD')"
      ]
    }
  ]
}

LIST

List partitions can be used to filter exact values. Each element represents a different filter.

Numeric partitions

{
  "defaultPartition": true, 
  "conditions": [
    {
      "columnExpression": "EMPLOYEE_ID",
      "type": "LIST",
      "dataType": "NUMBER",
      "elements": [
        "10",
        "20",
        "50"
      ]
    }
  ]
}

String partitions

{
  "defaultPartition": false,
  "conditions": [
    {
      "columnExpression": "EMPLOYEE_NAME",
      "type": "LIST",
      "dataType": "STRING",
      "elements": [
        "Jhon",
        "Ana",
        "Beatrice"
      ]
    }
  ]
}

Expression partitions

{
  "defaultPartition": false,
  "conditions": [
    {
      "columnExpression": "TRUNC(EMPLOYEE_ADMISSION_DATE)",
      "type": "LIST",
      "dataType": "EXPRESSION",
      "elements": [
        "TO_DATE('2012-07-17', 'YYYY-MM-DD')"
      ]
    }
  ]
}

COMBINED

{
  "defaultPartition": true, 
  "conditions": [
    {
      "columnExpression": "EMPLOYEE_NAME",
      "type": "LIST",
      "dataType": "NUMBER",
      "elements": [
        "Beatrice",
        "Ana"
      ]
    },
    {
      "columnExpression": "EMPLOYEE_ADMISSION_DATE",
      "type": "RANGE",
      "dataType": "EXPRESSION",
      "elements": [
        "TO_DATE('2015-01-01', 'YYYY-MM-DD')",
        "TO_DATE('2016-01-01', 'YYYY-MM-DD')"
      ]
    }
  ]
}

Physical Partition

Physical partitions are partitions which are defined directly on the source, using Oracle partitioning concept.

Limitations:

  • Hash partitioning is not supported.

  • Sub-partitioning is not supported.

Row ID Partition

Row ID partitions are partitions which are generated automatically based on the row id of the columns. The user must supply the number of partitions, and the range of row id partitions will be generated automatically based on it.