Introduction to the SAP Data Hub Modeler

The SAP Data Hub Modeler tool is based on the SAP Pipeline Engine that uses a flow- based programming paradigm to create data processing pipelines (graphs).

Big Data applications require advanced data ingestion and transformation capabilities. Some common use cases are to:

Ingest data from source systems. For example, database systems like SAP HANA, message queues like Apache Kafka, or data storage systems like HDFS or S3.
Cleanse the data.
Transform the data to a desired target schema.
Store the data in target systems for consumption, archiving, or analysis.

Users can model data processing pipelines as a computation graph, which can help to achieve the required data ingestion and transformation capabilities. In this graph, nodes represent operations on the data, while edges represent the data flow.

The SAP Data Hub Modeler tool helps users to graphically model and execute a graph. The tool also provides a runtime component to execute graphs in a containerized environment that runs on Kubernetes.

The SAP Data Hub Modeler tool provides certain predefined operators for productive use cases. These operators can help users define graphs, including non terminating, non connected, or cyclic graphs. The following example shows a simple interaction with Apache Kafka. The graph consists of two subgraphs. The first subgraph generates some data and writes the data into a Kafka message queue, while the second subgraph reads the data from Kafka, converts it to string and prints the data to a terminal.

You can also create generic data processing pipelines. The following example shows a graph that detects objects in a video stream.