Modeling Guide for SAP Data Hub

Using Python Libraries

To make Python libraries available to your operator you can add tags to it so that upon graph execution, the appropriate docker image can be chosen.

The user can create/extend Dockerfiles through the UI and add tags to it so that it can be associated with an operator. The following example details the necessary steps.

Suppose that we want to use the numpy library (version 1.16.1) on our custom Python operator.

First we need to create a new dockerfile.
  1. Open the Repository tab and then select a sub-folder.
  2. Right-click on it, select Create Docker File.
  3. Provide a name and choose Ok.

    There are many ways to create a dockerfile containing Python and the numpy library. You can for instance write "FROM <some_public_numpy_docker_image>" on the first line. If you choose to use a public image then you need to make sure that it also contains the requirements to run the Python sub-engine, which are: tornado==5.0.2, opensuse and Python 2.7 or Python 3.6. The tags key-value pairs for those requirements are: 'tornado': '5.0.2', 'opensuse': '', and 'python27': '' or 'python36': ''.

    Alternatively, we can also inherit from the default Modeler's python2 dockerfile and add numpy on top of it. In this way, all the requirements for the standard Python2 Operator will already be satisfied. The dockerfile below shows one way in which this could be accomplished:

    FROM $com.sap.opensuse.base
    									
    RUN pip install numpy=="1.16.1"
    				
    If you want to use a Python3 Operator then your dockerfile can be similar to the one below:
    FROM $com.sap.opensuse.base
    
    RUN python3.6 -m pip install numpy=="1.16.1"
  4. After writing the dockerfile content you need to add tags on the configuration panel for every relevant resource which this dockerfile offers in addition to those from its parent dockerfile. That is, you don't need to repeat the tags which the parent dockerfile already has. For example, the numpy-python2 dockerfile would need only the following tag numpy27: 1.16.1 and the numpy-python3 dockerfile would need only the following tag numpy36: 1.16.1.

  5. The final step is to add the new tag to your operator in the operator editor view. Add the tag "numpy27": "1.16.1" or "numpy36": "1.16.1" on your Python operator. Alternatively, you can also add tags to a group defined on the graph.

For more information about the Python2 Operator and Python3 Operator read their respective documentation in the SAP Data Hub Modeler UI.