Write File
The Write File operator writes files to a storage service.
-
Azure Data Lake Store (ADLS)
-
Local File System (file)
-
Google Cloud Storage (GCS)
-
HDFS
-
Amazon S3
-
Azure Storage Blob (WASB)
-
WebHDFS
Configuration Parameters
Parameter |
Type |
Description |
---|---|---|
mode |
string |
Controls whether the target file should be appended to, created (avoiding overwrites, truncated if it already exists), or overwritten (created if it does not exist). It may be dynamically set through the message header storage.writeMode. Default: "append" |
path |
string |
A formatted string describing the output path for files. See Path formatting below for details and examples. Default: "/tmp/file\_\<counter\>.txt" |
numRetryAttempts |
int |
The number of times to retry a connection. Default: 0 |
retryPeriodInMs |
int |
The time interval in milliseconds between connection trials. Default: 0 |
terminateOnError |
boolean |
Sets if the graph should terminate when the operator fails. Default: "true" |
connection |
object |
Holds information about connection information for the services. |
configurationType |
string |
connection parameter: Which type of connection information will be used: Manual (user input) or retrieved by the Connection Management Service. Default: "" |
connectionID |
string |
connection parameter: The ID of the connection information to retrieve from the Connection Management Service. Default: "" |
connectionProperties |
object |
connection parameter: All the connection properties for the selected service for manual input. |
clientId |
string |
ADL parameter: Mandatory. The client ID from ADLS.
Default: "" |
tenantId |
string |
ADL parameter: Mandatory. The tenant ID from ADLS. Default: "" |
clientKey |
string |
ADL parameter: Mandatory. The client key from ADLS. Default: "" |
accountName |
string |
ADL parameter: Mandatory. The account name from ADLS. Default: "" |
rootPath |
string |
ADL parameter: The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder). Default: "/MyFolder/MySubfolder" |
host |
string |
HDFS parameter: Mandatory. The IP address to the Hadoop name node. Default: "127.0.0.1" |
port |
string |
**HDFS parameter:** The port to the Hadoop name node. Default: "9000" mandatory |
user |
string |
**HDFS parameter:** The Hadoop user name. Default: "hdfs" mandatory |
rootPath |
string |
**HDFS parameter** The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder). Default: "/MyFolder/MySubfolder" |
keyFile |
string |
**GCS parameters:** Service account json key. Default: "" mandatory |
projectId |
string |
**GCS parameters:** The ID of project that will be used. Default: "projectID" mandatory |
rootPath |
string |
**GCS parameters:** "The optional root path name for browsing. Starts with a slash and the **bucket** name (e.g. /MyBucket/MyFolder). Default: "/MyBucket/MyFolder" |
accessKey |
string |
S3 parameter: Mandatory. The AWS access key ID.
Default: "AWSAccessKeyId" |
secretKey |
string |
S3 parameter: Mandatory. The AWS secret access key. Default: "AWSSecretAccessKey" |
endpoint |
string |
S3 parameter: allows a custom endpoint http://awsEndpointURL Default: "" |
awsProxy |
string |
S3 parameter: The optional proxy URL. Default: "" |
region |
string |
S3 parameter: Mandatory. The AWS region to create the bucket in. Default: "eu-central-1" |
rootPath |
string |
S3 parameter: Mandatory. The optional root path name for browsing. Starts with a slash and the bucket name (e.g. /MyBucket/MyFolder). Default: "/MyBucket/MyFolder" |
protocol |
string |
S3 parameter: Mandatory. The protocol schema to be used (HTTP or HTTPS). Default: "HTTP" |
accountName |
string |
WASB parameter: Mandatory. The account name from WASB. Default: "" |
accountKey |
string |
WASB parameter: Mandatory. The account key from WASB. Default: "" |
rootPath |
string |
WASB parameter: Mandatory. The optional root path name for browsing. Starts with a slash and the **container** name (e.g. /MyContainer/MyFolder). Default: "/MyContainer/MyFolder" |
protocol |
boolean |
WASB parameter: The protocol schema to be used (WASBS/HTTPS or WASB/HTTP) Default: true |
rootPath |
string |
WebHDFS parameter: The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder). Default: "/MyFolder/MySubfolder" |
protocol |
string |
WebHDFS parameter: Mandatory. The scheme used on WebHDFS connection (webhdfs/http or swebhdfs/https). Default: "webhdfs" |
host |
string |
WebHDFS parameter: Mandatory. The IP address to the WebHDFS node. Default: "127.0.0.1" |
port |
string |
WebHDFS parameter: Mandatory. The port to the WebHDFS node. Default: "9000" |
user |
string |
WebHDFS parameter: Mandatory. The WebHDFS user name. Default: "hdfs" |
webhdfsToken |
string |
WebHDFS parameter: The Token to authenticate to WebHDFS with. Default: "" |
webhdfsOAuthToken |
string |
WebHDFS parameter: The OAuth Token to authenticate to WebHDFS with. Default: "" |
webhdfsDoAs |
string |
WebHDFS parameter: The user to impersonate. Has to be used together with `webhdfsUser`. Default: "" |
Input
Input |
Type |
Description |
---|---|---|
inFile |
message |
A message whose body (blob) will be written to a file. There are no requirements on the message's headers other than those referred to in the path and mode configuration parameters. Default: |
Output
Output |
Type |
Description |
---|---|---|
outFilename |
string |
The path to the file to which content is written or appended. Whether this path is relative or absolute depends on how it was given to the path configuration. |
Path Formatting
Strings in the path configuration are subjected to the following rules:
- Schemes can be invoked using angle brackets: the string <foo> will be
replaced by the result of the scheme named "foo". Available schemes are:
-
counter: an incremental integer
-
date: the current local date in the format YYYYMMDD
-
time: the current local time in the format HHMMSS
Any other (unrecognized) scheme names will cause an error to be thrown.
-
- Message headers can be queried using \${...}. For example,
\${bar} would be replaced by the value of header "bar"
in the message given to inFile. Note that the dollar sign
must always be escaped with a backslash, otherwise it will be seen as a
substitution parameter.
-
A default value can be set using an "equals" sign: ${bar=lorem} will be replaced by the value "lorem" whenever the input message lacks the "bar" header. If no default value is set and the message is missing the header, an error will be thrown.
- Anything else (that is not between < and > or ${ and }) will be left untouched.
-
Limitations
-
The following characters cannot appear in scheme or message header names: <>${}.
-
Empty scheme (<>) or header (\${}) names will be left untouched.
Example:Basic Usage
mydir_<date>_<time>/${kafka.topic}.csvto produce output files like the following:
mydir_20170131_234550/noise.csv mydir_20170201_080010/temperature.csv mydir_20170201_080010/humidity.csv
Example: Copying Directories
outputDir/${storage.pathInPolledDir}can be expanded to, for example:
outputDir/YearReport.docx outputDir/January/sales.pdf outputDir/January/finance.xlsx outputDir/February/sales.pdf