Modeling Guide

Avro PreIngestor

The SAP Vora Avro PreIngestor operator allows you to decode coded (Avro, CSV, and JSON) messages and offer the possibility to process the decoded messages before storing them in SAP Vora using SAP Vora Ingestor.

For direct ingestion into SAP Vora, use the SAP Vora Avro Ingestor operator. An Avro schema is used to describe the structure of the record and the Vora table schema is derived from this schema. The Avro schema can be provided in several ways. If the message is an Avro message, the schema may be included in the message. Otherwise, the schema can be configured either in this operator's configuration or included in the message itself under attribute "avro.schema".

When using CSV messages, each line is interpreted using the provided Avro schema. When there is no header specified in CSV, the value is directly assigned to the corresponding field (i.e., the n-th field of CSV is assigned to the n-th field of Avro record). When a header is specified, the value is assigned to the field with the corresponding field name.

The following table describes the supported Avro primitive and logical types and how they are translated into Vora SQL types.

Avro

go

js

CSV (sample values)

Vora

boolean

bool

bool

true

BOOLEAN
int int32

Number

9 INTEGER
long int64

Number

99 BIGINT
float float32

Number

99.9 FLOAT

double

float64

Number

99.99 DOUBLE
bytes string

string

aGVsbG8= VARCHAR(*)

string

string

string

hello

VARCHAR(*)

decimal(p,s)

string

string

1987.74

DECIMAL(p,s)

date

string

string

2017-08-29

DATE

time-millis

string

string

15:28:50.345

TIME

time-micros

string

string

15:28:50.345678

TIME

timestamp-millis

time.Time

string

2017-08-29 15:28:50.345

TIMESTAMP

timestamp-micros

time.Time

string

2017-08-29 15:28:50.345678

TIMESTAMP

fixed(n)

string

string

68656c6c6f

VARCHAR(2n)

Note aGVsbG8= and 68656c6c6f are the base64 and hexadecimal representations of value hello, respectively.

The above Avro to Vora type association may be customized using the extension properties. The following table describes the supported extension properties.

Extension Property

Supported in Avro Types

Property Type

Description

colName

all types

string

Specify the column name.

size

int

Number

Specify the bit size 8, 16, 32, 64 to use TINYINT, SMALLINT, INTEGER, BIGINT, respectively.

maxLength

string, bytes

Number

Specify the maximal length.

primaryKey

all types

bool

Specify whether the field is primary key.

A record may be nested arbitrarily but must be bounded so that it can be flattened to a fixed table column definition. In the derived table column definition, the column names correspond to the the fully qualified field names of the record (i.e., each field name in the nested structure is concatenated. For example, a field named bar under its parent field named foo is named as foo_bar).

When using CSV records, records are represented by rows of CSV lines. The first line may represent the field or column names, commonly known as the header line. In this case, the ordering of the fields do not necessarily match the ordering of the fields defined in the avro schema. Furthermore, some fields may be omitted if there is no value corresponding to those fields. If no header line is present, the ordering of the values must match the ordering of the fields defined in the avro schema.

When using JSON records, records are represented as a JSON array of maps or arrays. The former assumes a map based representation of a record, where each key value pair in the map is assigned to its corresponding field. The latter assumes the structure similar to CSV, where an array of values is used to represent a record. In this case, the first array may represent the header line. The JSON records may be given as a structured object (i.e., as an array of maps or arrays) or in its serialized form.

The body of the decoded message is an array of records, where each record is an array of golang typed values.

In addition, the decoded message contains the metadata in attribute "vora.record.definition". This metadata has the following properties that describe the types of the decoded message and determine the target database column types.

Property Name

go type

js type

Description

recName

string

string

The record name which is used to derive the table name (e.g., "edevice_record")

fieldNames

[]string

string[]

The field names (e.g., ["idx", "code", "magnitude"])

fieldTypes

[]string

string[]

The field types in golang type (e.g., ["int", "string", "double"])

fieldNillables

[]bool

boolean[]

The field nillables (e.g., [false, false, false])

fieldPrimaryKeys

[]bool

boolean[]

The field primary keys (e.g., [true, false, false])

colNames

[]string

string[]

The table column names (e.g., ["idx", "code", "magnitude"]

colTypes

[]string

string[]

The table column types (e.g., ["INTEGER", "VARCHAR(25)", "DOUBLE"]

If the body of the decoded message is modified or rearranged, the above metadata must be adjusted to match the modified body.

Sample Graph

Description

com.sap.demo.vora.ingestion.avro_ingestion_example_disk

Reading avro messages from kafka and writing to vora disk engine.

com.sap.demo.vora.ingestion.avro_ingestion_example_series

Reading avro messages from kafka and writing to vora timeseries engine.

com.sap.demo.vora.ingestion.csv_ingestion_example2_disk

Generating csv messages and writing to vora disk engine

com.sap.demo.vora.ingestion.csv_ingestion_example2_series

Generating csv messages and writing to vora timeseries engine.

com.sap.demo.vora.ingestion.csv_ingestion_example3_disk

Generating csv messages and writing to vora disk engine using an avro schema with additional metadata to customize the table.

com.sap.demo.vora.ingestion.json_ingestion_example2_disk

Generating JSON messages and writing to vora timeseries engine.

Configuration Parameters

Parameter

Type

Description

defaultAvroSchema

string

The default Avro schema to be used when the incoming message does not include its schema. This parameter is mandatory for Avro messages without schema, CSV or JSON messages.

Default: ""

varLimit

integer

The default size limit for the varchar columns, where 0 indicates unlimited, i.e. '*'.

Default: 0

format

string

The input message format. The accepted values are "avro", "csv", or "json".

Default: "avro"

csvComma

rune

The delimiter character code for the CSV format. For example, 44 for ','; 59 for ';'; 124 for '|'.

Default: ','

csvHeaderIncluded

bool

The input CSV message contails the header line.

Default: false

Input

Input

Type

Description

in

message

Accepts messages containing Avro, CSV, or JSON messages in the body.

Output

Output

Type

Description

out

message

Messages with record objects in the body and a header vora.record.definition that contains the structure definition of the record.