Modeling Guide for SAP Data Hub


Subengines in the SAP Data Hub Modeler allow operators to be implemented for different runtimes apart from the main engine. The main engine is a process that coordinates the graphs executions and also execute native operators.

Subengines allow one graph to contain operators that run in the main engine, other operators which are implemented in Python and executed in the Python subengine, and yet other operators which are implemented in C++ and run in the C++ subengine. Some operators can have implementations for more than one engine (the term engine here means either the main engine or a subengine). In this case, you can select a subset of the available engines in the operator's configuration panel from which the optimizer can choose from. The optimizer will assign an engine for each operator in a way that tries to minimize the number of edges crossing different engines. A cluster of connected operators scheduled to be executed in the same engine will all run in the same OS process.

Current available subengines are: C++, Python2.7, and Python3.6.

Some advantages of subengines are:
  • Connected operators that belong to the same subengine can be run in a single process. This is better than using the ProcessExecutor Operator to execute external scripts, which launches a new process for each operator.

  • Scriptable operators in different languages: Python2 Operator, Python3 Operator.
    • The scripts for these operators can be edited in the UI and therefore you don't need to bother with serializing and deserializing or outgoing and incoming data.

  • It makes it possible for SAP to develop and deliver operators implemented in programming languages other than the one used in the main engine.

  • You can create and add new operators to the SAP Data Hub Modeler in different programming languages.

    • For example, even though the C++ subengine has no script operator, you can still develop new operators in C++ in your local machine, compile it, and then upload a package with the generated .so files to the cluster through the SAP Data Hub System Management. More information can be found in the C++ section.

    • For the Python subengines, although you can extend the script operator Python2 Operator and Python3 Operator with new behavior, you can also develop new operators in your own machine, and then upload those new operators to the cluster via SAP HANA System Management.

      More information can be found in the Python subengines section.