Modeling Guide

JSON Ingest2 via Disk

The data generator generates a series of messages, each consisting of one or more JSON records. Each JSON records may be represented as a JSON object or a JSON value array. If multiple records are included in a single message, these records must be of the same form and included in a JSON array. Each message is passed to the preingestor, which creates a message with record objects. Finally, this message is passed to the ingestor, which stores the records in the vora disk engine and emits the commit token associated with each message.

The table definition is derived from the avro schema configured at the preingestor, which is given as:
{ "name": "sample_demo_deep_record", "type": "record", "fields": [ {"name": "idx", "type": "int"}, {"name": "code", "type": "string"}, {"name": "magnitude", "type": "double"}, {"name": "name", "type": "string"}, {"name": "coordinates", "type": "record", "fields": [ {"name": "latitude", "type": "double"}, {"name": "longtitude", "type": "double"}]}, {"name": "ts", "type": "long", "logicalType": "timestamp-millis"}, {"name": "status", "type": "boolean"}] }

The datagenerator operator will transmit a mesage whose body consists of one or more JSON records that match the above schema.

The corresponding JSON records may be represented as a single record or an array of records, where each record can be represented as a json object or a json value array.

For example, this can look either like this:
[{"code":"iXNWM","coordinates":{"latitude":55.87086007693324,"longtitude":-41.24651001822619},"idx":6077,"magnitude":2516.2811424004744,"name":"H2VGxtYlahjBHoLcm","status":false,"ts":"2018-05-07 3:25:00.557"}, {"code":"IZFux","coordinates":{"latitude":79.13471154134257,"longtitude":71.62190615872217},"idx":3700,"magnitude":8821.138614425376,"name":"WoamNWLcJAfSgm9m6rdujQ","status":true,"ts":"2018-05-07 3:25:00.557"}, {"code":"EDa2M","coordinates":{"latitude":65.0152956644234,"longtitude":-167.62832185615025},"idx":530,"magnitude":9570.581735765665,"name":"a5DnGc7YqIsz","status":true,"ts":"2018-05-07 3:25:00.557"}, ...
or like this:
[[2217,"1KQ9B",2372.7511416642337,"rRSgQyMUXqPG6UEQuMHkDLxUucg",-6.103202884364208,63.69302212520921,"2018-05-07 3:29:46.035",false], [1624,"kLzw3",3226.511391502805,"zbf3IqsgIXO4jt050U4AfWUI87gKK681",6.171226722764729,-11.211557814247755,"2018-05-07 3:29:46.035",true], [7263,"ykQG6",9657.557856232173,"KtWSrO4eEDM9THVLS",-70.42437579521655,-83.99133796607525,"2018-05-07 3:29:46.036",true], ... ]

Prerequisites

You need a running SAP Vora instance.

Configure and Run the Graph

Follow the steps below to run the example from the Data Pipeline UI:
  1. In the left panel, select the Graphs tab and navigate to com/sap/demo/vora/ingestion/json_ingestion_example2_disk.
  2. Check the configuration of the ingestor node: dsn.
  3. In the tool bar, select Run (play button).
  4. The Status panel indicates if the graph is running.
  5. Use the context menu Open UI of the Wiretap node to open the wiretap.
  6. The wiretap opens and you see the commit tokens.
  7. Stop the graph and change the generator's batchSize and run the graph again.