Modeling Guide for SAP Data Hub

Vora Ingestor

The Vora Ingestor operator allows you to dynamically ingest data into SAP Vora based on incoming records messages. The DB table and its column definitions are determined by the metadata included in the message under attribute "vora.record.definition".

The target table will be automatically created with the information provided in the metadata. If the corresponding table already exists, its columns must match those columns defined in the metadata. This metadata is typically generated by SAP Vora Avro Decoder which is capable of extracting record fields based on the provided Avro schema from input data in various formats such as Avro, Json and CSV.

Sample Graph

Description

com.sap.demo.vora.ingestion.avro_ingestion_example_disk

Reading Avro messages from Kafka and writing to Vora Disk Engine.

com.sap.demo.vora.ingestion.avro_ingestion_example_series

Reading Avro messages from Kafka and writing to Vora Timeseries Engine.

com.sap.demo.vora.ingestion.csv_ingestion_example2_disk

Generating CSV messages and writing to Vora Disk Engine.

com.sap.demo.vora.ingestion.csv_ingestion_example2_series

Generating CSV messages and writing to Vora Timeseries Engine.

com.sap.demo.vora.ingestion.csv_ingestion_example3_disk

Generating CSV messages and writing to Vora Disk Engine using an Avro schema with additional metadata to customize the table.

com.sap.demo.vora.ingestion.json_ingestion_example2_disk

Generating CSV messages and writing to Vora Disk Engine using an Avro schema with additional metadata to customize the table.

com.sap.demo.vora.ingestion.rec_ingestion_example2_disk

Generating record messages and writing to Vora Disk Engine.

Configuration Parameters

Parameter

Type

Description

connectionType

string

The connection to SAP Vora can be configured directly using dsn or indirectly using connection.

Default: "dsn"

dsn

string

A valid data source name in the format v2://host:port/?binary=true. Make sure that you add /?binary=true to the end, because only binary transfer is available for the SAP Vora Transaction Coordinator.

Default: "v2://localhost:2204/?binary=true"

user

string

The user name if the connection is configured using dsn.

Default: ""

password

string

The password if the connection is configured using dsn.

Default: ""

connection

object

A valid connection configuration provided by ConnectionManager.

aggregation

bool

Enables the automatic aggregation of records to trigger a series of bulk inserts independently of the number of records contained in each Avro message.

Default: false

aggregateMaxBytes

int

Limits the maximal size of the aggregated records in bytes under the auto-aggregation mode. Until the aggregated records reach this limit, the records are aggregated.

Default: 4194304

aggregateMaxRecs

int

Limits the maximal number of the aggregated records under the auto-aggregation mode. Until the aggregated records reach this limit, the records are aggregated.

Default: 1000

aggregateMaxTime

int

Limits the maximal time in milliseconds to wait until flushing the aggregated records that have not reached the size constraints.

Default: 2000

databaseSchema

string

The database schema name.

Default: "TPCH"

engineType

string

The engine type ("DISK" or "SERIES").

Default: "DISK"

partitionKeyRegex

string

A regular expression to select a sequence of Avro record field names to be bound to the arguments to the specified particion function. Details: When generating the partitioning scheme (the actual binding of arguments to parameters of partition keys) the system deduces the matching of actual columns to parameters of the partition scheme using this regular expression.

Example: Suppose we have a table "T(col1 VARCHAR(500), col2 BIGINT, col3 DATE)", and a hash partition function "pf(pa, pb)". Then, specifying partitionKeyRegex , .*1|.*3 will select col1 and col3 to be bound to the parameters pa and pb, respectively.

Default: ""

partitionCriterion

string

A partition function.

Default: false

tableType

string

The table type ("STREAMING").

Default: "STREAMING"

ingestionMode

string

The ingestion mode INSERT or UPSERT

Default: "INSERT"

primaryKeyRegex

string

A regular expression to select some Avro record field names to be used as the primary keys. The primary keys may be specified in the Avro schema using extension property "primaryKey". This parameter is only useful when the primary keys are not specified in the Avro schema.

Default: ""

Input

Input

Type

Description

in

message

Accepts record messages to be processed.

Output

Output

Type

Description

out

message

Messages with the header properties that indicate the commit progress. If the input message does not contain a commit token (i.e., message header message.commit.token), no output will be generated.