Modeling Guide for SAP Data Hub

Write File

The Write File operator writes files to a storage service.

The operation takes only one input parameter: the content of the file. The target file is configured in the operator's path. To dynamically change the target, see Path formatting below.

Configuration Parameters

Parameter

Type

Description

mode

string

Controls whether the target file should be appended to, created (avoiding overwrites, truncated if it already exists), or overwritten (created if it does not exist). It may be dynamically set through the message header storage.writeMode.

Default: "append"

path

string

A formatted string describing the output path for files. See Path formatting below for details and examples.

Default: "/tmp/file\_\<counter\>.txt"

numRetryAttempts

int

The number of times to retry a connection.

Default: 0

retryPeriodInMs

int

The time interval in milliseconds between connection trials.

Default: 0

terminateOnError

boolean

Sets if the graph should terminate when the operator fails.

Default: "true"

connection

object

Holds information about connection information for the services. Each service connection parameters is documented separately:

configurationType

string

connection parameter: Which type of connection information will be used: Manual (user input) or retrieved by the Connection Management Service.

Default: ""

connectionID

string

connection parameter: The ID of the connection information to retrieve from the Connection Management Service.

Default: ""

connectionProperties

object

connection parameter: All the connection properties for the selected service for manual input.

Input

Input

Type

Description

inFile

message

A message whose body (blob) will be written to a file. There are no requirements on the message's headers other than those referred to in the path and mode configuration parameters.

Default:

Output

Output

Type

Description

outFilename

message

A message whose body is the path to the file to which content is written or appended. Whether this path is relative or absolute depends on how it was given to the path configuration. The header message.error (bool) reports whether the operation was successful. Any other header from input is copied to this message.

Path Formatting

Strings in the path configuration are subjected to the following rules:

  • Schemes can be invoked using angle brackets: the string <foo> will be replaced by the result of the scheme named "foo". Available schemes are:
    • counter: an incremental integer

    • date: the current local date in the format YYYYMMDD

    • time: the current local time in the format HHMMSS

    Any other (unrecognized) scheme names will cause an error to be thrown.

  • Message headers can be queried using \${...}. For example, \${bar} would be replaced by the value of header "bar" in the message given to inFile. Note that the dollar sign must always be escaped with a backslash, otherwise it will be seen as a substitution parameter.
    • A default value can be set using an "equals" sign: ${bar=lorem} will be replaced by the value "lorem" whenever the input message lacks the "bar" header. If no default value is set and the message is missing the header, an error will be thrown.

    • It is recommended to escape the dollar sign with a backslash (e.g. \${bar}) to prevent the Pipeline Modeler from interpreting it as a substitution parameter.
    • Anything else (that is not between < and > or ${ and }) will be left untouched.

Limitations

  • The following characters cannot appear in scheme or message header names: <>${}.

  • Empty scheme (<>) or header (\${}) names will be left untouched.

Example:Basic Usage

Suppose you have messages coming from a Kafka Consumer whose topic describes the type of sensor that sent the message. Then, you can use a path such as:
mydir_<date>_<time>/${kafka.topic}.csv
to produce output files like the following:
mydir_20170131_234550/noise.csv
mydir_20170201_080010/temperature.csv
mydir_20170201_080010/humidity.csv

Example: Copying Directories

If we want to reproduce an entire directory structure, we can set a File, HDFS or S3 Consumer to poll this directory and use the storage.pathInPolledDirectory header to refer to each file's location in it:
outputDir/storage.pathInPolledDirectory
can be expanded to, for example:
outputDir/YearReport.docx
outputDir/January/sales.pdf
outputDir/January/finance.xlsx
outputDir/February/sales.pdf