Modeling Guide for SAP Data Hub

Port Types

The operator uses ports as an interface to communicate between operators in a graph.

A port definition includes the following:
  • Input or output port; there are no specific error ports, use output ports to communicate error messages.
  • Name of the port: A unique port name that consists of alphanumeric characters only.
  • Port type.
A port type is a string with a defined structure having a mandatory base type, and an optional semantic type. The latter can have a hierarchical substructure, separated with periods and an optional wildcard at the end. It has the following form: <base type>.<semantic type>.

Types with a wildcard are called incomplete types. A general port type specification may look like.

You can use the semantical part of the type specification to enrich the types with additional information that the owner of the types can use. But, the engine does not evaluate beyond the compatibility checks described below.

Base Types

All types that you can use to type a port are technically reduced to one of the following built-in base types:

Name

Description

Is compatible with type "any"

Array

any

generic type

yes

no

string

character sequence

yes

yes

blob

binary large object

yes

yes

int64

8 byte signed integer

yes

yes

float64

8 byte decimal number

yes

yes

byte

single character

yes

yes

message

structure with header and body

no

no

stream

unstructured data stream

no

no

The last column in the table indicates whether it is possible to use arrays of the type. For example, you can use []float64 but not []message.

Some use cases for pipeline-specific types are:
  • The type any can be used if a an operator is agnostic of the type and helps to avoid the redefinitions of operator for each type. Typical examples are the multiplexer operators.
  • The type message consists of a message header and the payload stored in the body. Messages have a size limit of currently 10 MB. This means that, larger payloads have to be split up into chunks. An example for this is the Read File operator, where the header of the response messages contain the information to interpret the content of the body. In other scenarios, the input message triggers an operator to transfer data specified by the message and the output message transfers the result of this operation. For example, the Copy File operator. The header information can then be used to match the requests with the results. Therefore, arrays of messages themselves do not make much sense. However, arrays in the body is possible.
  • The type stream is special in the sense that the other types (including any or message) have at least at execution time a fixed structure (elementary type and length). Streams in general are unstructured. Typical examples are the IO streams stdin, stdout, stderr of the operating system or data streams generated by sensors.

Conversions

In general there are no implicit type conversions or propagations. If the types are incompatible according to the rules above, you cannot run a graph. However, there are two exceptions to the rule.

They are both concerned with the type message. The first one for a input port of type message and the second one for output ports of type message.

Input ports of type message

If the output port of the other operator is a non-stream base type, there is an automated handling to transform the output into a message. For message types, this behavior is obvious and for all other types, the engine generates a minimal message storing the output result in its body.

Output ports of type message

While the incoming message is automatically handled by the engine, the outgoing message triggers an action in the UI when you try to connect the ports.

Example:
  • output port is of type message, incomplete, or generic type that would allow for a message to be passed.

  • input port of the receiving operator is of type string.
In such cases, there are two ways to transform the outgoing message to a string:
  • either concatenate the string from the serialized header and body of the message or
  • use only the body of the message and output it as string (if this body itself is a message then it is handled as in the first case).

The choice may well depend on the semantics of the receiving operator. Therefore, there is no one recommended approach.