Modeling Guide for SAP Data Hub

Using Python Libraries

To make python libraries available to your operator you can add tags to it so that upon graph's execution the right docker image can be chosen.

The user can create/extend Dockerfiles through the UI and add tags to it so that it can be associated with an operator. The following example details the necessary steps.

Suppose that we want to use the numpy library (version 1.16.1) on our custom python operator.

First we need to create a new dockerfile.
  1. Open the Repository tab and then select a sub-folder.

  2. Right-click on it, select Create Docker File, choose a name and press Ok.

    There are many ways to create a dockerfile containing python and the numpy library. You can for instance write "FROM <some_public_numpy_docker_image>" on the first line. Here, however, we will show how one can inherit from the default Modeler's python2 dockerfile and add numpy on top of it. The dockerfile below shows one way that it could be accomplished: dockerfile
    FROM $com.sap.python27
    
    RUN apt-get install -y python-pip
    
    RUN pip install numpy=="1.16.1"
    
    If you want to use a Python3 Operator then your dockerfile can be similar to the one below: dockerfile
    FROM $com.sap.python36
    
    ENV PYTHON_PIP_VERSION 19.0.1
    
    RUN set -ex; \
    	\
    	savedAptMark="$(apt-mark showmanual)"; \
    	apt-get update; \
    	apt-get install -y --no-install-recommends wget; \
    	\
    	wget -O get-pip.py 'https://bootstrap.pypa.io/get-pip.py'; \
    	\
    	apt-mark auto '.*' > /dev/null; \
    	[ -z "$savedAptMark" ] || apt-mark manual $savedAptMark; \
    	apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
    	rm -rf /var/lib/apt/lists/*; \
    	\
    	python get-pip.py \
    		--disable-pip-version-check \
    		--no-cache-dir \
    		"pip==$PYTHON_PIP_VERSION" \
    	; \
    	pip --version; \
    	\
    	find /usr/local -depth \
    		\( \
    			\( -type d -a \( -name test -o -name tests \) \) \
    			-o \
    			\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
    		\) -exec rm -rf '{}' +; \
    	rm -f get-pip.py
    +
    RUN pip3.6 install numpy=="1.16.1"

    For the numpy-python3 dockerfile the tags would be:

    json
       "python36": ""
       "debian": "",
       "numpy36": "1.16.1"
  3. After writing the dockerfile content you need to add tags on the configuration panel for every relevant resource which this dockerfile offers. When inheriting from another existing dockerfile defined on the Modeler you need to make sure that your new dockerfile includes all the tags from its parent. For example, for the numpy-python2 dockerfile the tags would be the following: json
    "python27": ""
        "debian": "",
        "numpy27": "1.16.1"
    For the numpy-python3 dockerfile the tags would be: json
    "python36": ""
        "debian": "",
        "numpy36": "1.16.1"
  4. The final step is to add the new tag to your operator in the operator editor view. Add the tag "numpy27": "1.16.1" or "numpy36": "1.16.1" on your python operator. Alternatively, you can also add tags to a group defined on the graph.

For more information about the Python2 Operator and Python3 Operator read their respective documentation in the SAP Data Hub Modeler UI.