Modeling Guide for SAP Data Hub

Port Data

Port types are not known at compile time. The subengine uses the handle, v2_datum_t to carry the generic inputs and outputs between the operators. The v2_datum_t handle is a reference-counted handle to an underlying piece of data.

Also, because this is a pure C API, you cannot use constructors and destructors to acquire and release these handles. Therefore, we recommend that you follow these guidelines:
  • If you get a datum as a parameter in the port handler, do not release it. The engine does that.
  • If you create a datum using the v2_datum_create set of functions, release it before it gets out of scope. Not releasing it may result in memory leaks. It is also important to release it even if you output the datum because this datum is going to be given to the next operator downstream, and the call to v2_process_output increases its reference count by one. This is similar to making a copy of a std::shared_ptr.

Data Types

A datum may contain any one of the following types:

Modeler type

C type

ID ( v2_type_t )

Size (bytes)

string

char*

V2_TYPE_STRING

Length of the string

int64

int64_t

V2_TYPE_INT64

8

float64

double

V2_TYPE_DOUBLE

8

blob

blob

V2_TYPE_BLOB

Size of the blob

uint64

uint64_t

V2_TYPE_UINT64

8

message

v2_message_t

V2_TYPE_MESSAGE

0 (Size of a message is only known when it is serialized, so it cannot be queried).

byte

char

V2_TYPE_BYTE

1

User-defined data types

void*

V2_TYPE_CUSTOM

0 (Engine does not know the size of user-defined data types.)

For each data type, the ID column in the table shows the return value of v2_datum_get_type. This allows you to have generic operators (for example, ports with type any) that can check their input type at runtime.

The Size column in the table shows the return value of v2_datum_get_size.

Ownership

Whenever the reference count of datum reaches zero, the engine deallocates the memory that belongs to the datum. For string, blob, message, and custom types this behavior can have one further consequence: whether or not to free the memory associated with the data itself. So, when creating a datum from one of those types, you can choose whether or not you want it to own the data:
  • A nonowning datum is created using the v2_datum_from_<type> functions, which do not deallocate the given data pointer. Thus, ensure that the functions remain available throughout the lifetime of the datum. This behavior is typical of static lifetime variables. For example:
    const char* YES = "yes";
    const char* NO  = "no";
    
    // A port handler
    const char* is_even_input( v2_process_t proc, v2_datum_t datum )
    {
      int64_t i = v2_datum_get_int64( datum );
      v2_datum_t out;
      if ( i % 2 ) // not even
        out = v2_datum_from_string( NO, 2 );
      else
        out = v2_datum_from_string( YES, 3 );
      v2_process_output( proc, "outEven", out );
      // Release local handle
      v2_datum_release( out );
      return NULL;
    }
  • An owning datum is created using the v2_datum_own_<type> functions. These functions take not only the pointer to the data, but also a "destructor" function pointer, which will be invoked with the data pointer as the argument. For example, an operator that concatenates the strings it receives two by two:
    // Port handler
    const char* concat_on_input( v2_process_t proc, v2_datum_t datum )
    {
      v2_datum_t previous = (v2_datum_t)v2_process_get_user_data( proc );
    
      if ( previous == NULL ) {
        // Store this datum for later use
        v2_process_set_user_data( proc, datum );
        // Let the engine know we've just stored an additional
        // reference to this datum
        v2_datum_acquire( datum );
      } else {
        char*    prev_str  = v2_datum_get_string( previous );
        uint64_t prev_size = v2_datum_get_size( previous );
    
        char*    curr_str  = v2_datum_get_string( datum );
        uint64_t curr_size = v2_datum_get_size( datum );
    
        // Allocate enough memory for both strings together. Null-terminating it
        // is optional, since we provide its length to v2_datum_own_string.
        uint64_t out_size = prev_size + curr_size;
        char* out_string = (char*)malloc( out_size );
        // Copy them to out_string
        strncpy( out_string, prev_str, prev_size );
        strncpy( out_string + prev_size, curr_str, curr_size );
    
        // Since out_string is on the heap, we need an owning datum
        // so it will call `free` on the string once its done.
        v2_datum_t out = v2_datum_own_string( out_string, out_size, free );
        v2_process_output( proc, "output", out );
        // Cleanup
        v2_datum_release( out );
        v2_datum_release( previous );
        v2_process_set_user_data( proc, NULL );
      }
    
      return NULL;
    }

User-defined Types

Declare the SAP Data Hub Modeler name for a user-defined types in the meta.json file that is under the root directory of the subengine. Typically, it is <repo-root>/subengines/<sub-engine-id>/meta.json .

For example, if you are creating image-processing operators, you may want to define your own "image" type. This user-defined type helps the data to flow among such operators without having to conform to the standard types in the modeler. In this example, your meta.json might look like this:
{
  "versions": ["1.0"],
  "types": ["image"]
}
Adding a type to meta.json enables the main engine to recognize its existence and accept graphs that use these types as valid.