Modeling Guide for SAP Data Hub

Introduction to the SAP Data Hub Modeler

The SAP Data Hub Modeler tool is based on the Pipeline Engine that uses a flow-based programming paradigm to create data processing pipelines (graphs).
Big Data applications require advanced data ingestion and transformation capabilities. Some common use cases are to:
  • Ingest data from source systems. For example, database systems like SAP HANA, message queues like Apache Kafka, or data storage systems like HDFS or S3.
  • Cleanse the data.
  • Transform the data to a desired target schema.
  • Store the data in target systems for consumption, archiving, or analysis.

You can model data processing pipelines as a computation graph, which can help to achieve the required data ingestion and transformation capabilities. In this graph, nodes represent operations on the data, while edges represent the data flow.

The Modeler application helps you to model graphs using graphical capabilities and execute them. It provides a runtime component to execute graphs in a containerized environment that runs on Kubernetes.

The Modeler also provides predefined operators, which you can use for many productive business use cases. These operators can help you define graphs, including non terminating, non connected, or cyclic graphs. The following example shows a simple interaction with Apache Kafka. The graph consists of two subgraphs. The first subgraph generates some data and writes the data into a Kafka message queue. The second subgraph reads the data from Kafka, converts it to string and prints the data to a terminal.

You can also create generic data processing pipelines. The following example shows a graph that detects objects in a video stream.