Modeling Guide

Move File

The Move File operator is used to move (rename) files in a file service. The expected input message is specified in the Input section in this document.

This operation is recursive, meaning it will move any files under the given source path. Interpretation of a path may differ on service as a directory (ADLS, file, HDFS, WebHDFS) or as a prefix (GCS, S3, WASB).

If destination is a file, it will be overwritten; if it is a non-empty directory, it will fail.

  • Moving source: a/file1.txt to destination: newfile.txt, would succeed, since the destination does not exist.

  • Moving source: a/file1.txt to destination: b/f1.txt, would succeed and overwrite b/f1.txt, since the destination is an existing file.

  • Moving source: a/file1.txt to destination: b/, would fail, since b/ already exists and is not empty.

  • Moving source: a/ to destination:b/ would fail, since b/ already exists and is not empty.

  • Moving source: a/ to destination: b/dir/ would succeed, since b/dir/ does not exist.

Supported services are:
  • Azure Data Lake Store (ADLS)

  • Local File System (file)

  • Google Cloud Storage (GCS)

  • HDFS

  • Amazon S3

  • Azure Storage Blob (WASB)

  • WebHDFS

Configuration Parameters

Parameter

Type

Description

service

string

The file service to operate. Additional parameters may depend on the selected service.

Default: "file"

timeoutInMs

int

Sets the time limit to execute the operation. If `0`, no timeout is used.

Default: 0

retryPeriodInMs

int

The time interval in milliseconds between connection trials.

Default: 0

numRetryAttempts

int

The number of times to retry a connection.

Default: 0

simultaneousRequests

int

The number of simultaneous requests generated on recursive calls (only available for GCS, S3 and WASB).

Default: 1

stopRequestOnError

boolean

Sets if simultaneous requests from recursive calls should stop at first error (only available for GCS, S3 and WASB).

Default: false

terminateOnError

boolean

Sets if the graph should terminate when the operator fails.

Default: "true"

connection

object

Holds information abount connection information for the services.

Default:

configurationType

string

connection parameter: Which type of connection information will be used: Manual (user input) or retrieved by the Connection Management Service.

Default: ""

connectionID

string

connection parameter: The ID of the connection information to retrieve from the Connection Management Service.

Default: ""

connectionProperties

object

connection parameter: All the connection properties for the selected service for manual input.

clientId

string

ADL parameter: Mandatory. The client ID from ADLS.

Default: ""

tenantId

string

ADL parameter: Mandatory. The tenant ID from ADLS.

Default: ""

clientKey

string

ADL parameter: Mandatory. The client key from ADLS.

Default: ""

accountName

string

ADL parameter: Mandatory. The account name from ADLS.

Default: ""

rootPath

string

ADL parameter: The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder).

Default: "/MyFolder/MySubfolder"

host

string

HDFS parameter: Mandatory. The IP address to the Hadoop name node.

Default: "127.0.0.1"

port

string

HDFS parameter: Mandatory. The port to the Hadoop name node.

Default: "9000"

user

string

HDFS parameter: Mandatory. The Hadoop user name.

Default: "hdfs"

rootPath

string

HDFS parameter: The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder).

Default: "/MyFolder/MySubfolder"

keyFile

string

GCS parameters: Mandatory. Service account json key.

Default: ""

projectId

string

GCS parameters: Mandatory. The ID of project that will be used.

Default: "projectID"

rootPath

string

GCS parameters: "The optional root path name for browsing. Starts with a slash and the **bucket** name (e.g. /MyBucket/MyFolder).

Default: "/MyBucket/MyFolder"

accessKey

string

S3 parameter: Mandatory. The AWS access key ID.

Default: "AWSAccessKeyId"

secretKey

string

S3 parameter: Mandatory. The AWS secret access key.

Default: "AWSSecretAccessKey"

endpoint

string

S3 parameter: allows a custom endpoint http://awsEndpointURL

Default: ""

awsProxy

string

S3 parameter: The optional proxy URL.

Default: ""

region

string

S3 parameter: Mandatory. The AWS region to create the bucket in.

Default: "eu-central-1"

rootPath

string

S3 parameter: Mandatory. The optional root path name for browsing. Starts with a slash and the bucket name (e.g. /MyBucket/MyFolder).

Default: "/MyBucket/MyFolder"

protocol

string

S3 parameter: Mandatory. The protocol schema to be used (HTTP or HTTPS).

Default: "HTTP"

accountName

string

WASB parameter: Mandatory. The account name from WASB.

Default: ""

accountKey

string

WASB parameter: Mandatory. The account key from WASB.

Default: ""

rootPath

string

WASB parameter: Mandatory. The optional root path name for browsing. Starts with a slash and the **container** name (e.g. /MyContainer/MyFolder).

Default: "/MyContainer/MyFolder"

protocol

boolean

WASB parameter: The protocol schema to be used (WASBS/HTTPS or WASB/HTTP)

Default: true

rootPath

string

WebHDFS parameter: The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder).

Default: "/MyFolder/MySubfolder"

protocol

string

WebHDFS parameter: Mandatory. The scheme used on WebHDFS connection (webhdfs/http or swebhdfs/https).

Default: "webhdfs"

host

string

WebHDFS parameter: Mandatory. The IP address to the WebHDFS node.

Default: "127.0.0.1"

port

string

WebHDFS parameter: Mandatory. The port to the WebHDFS node.

Default: "9000"

user

string

WebHDFS parameter: Mandatory. The WebHDFS user name.

Default: "hdfs"

webhdfsToken

string

WebHDFS parameter: The Token to authenticate to WebHDFS with.

Default: ""

webhdfsOAuthToken

string

WebHDFS parameter: The OAuth Token to authenticate to WebHDFS with.

Default: ""

webhdfsDoAs

string

WebHDFS parameter: The user to impersonate. Has to be used together with webhdfsUser.

Default: ""

Input

Input

Type

Description

in

message

A message whose body has the source file path and destination. Expected message attributes:
storage.src
Mandatory. Path to source file or directory.
storage.dst
Mandatory. Path to destination file or directory (should match types with source). The destination path may contain <dirname> and <basename>, referring to the given source path's dirname and basename. E.g: move src:data/A/user.csv dst:<dirname>/processedFiles/<basename> would map to the destination data/A/processedFiles/user.csv.
storage.srcBucket
Overrides bucket from configuration on service GCS and S3 for the source file or directory.
storage.dstBucket
Overrides bucket from configuration on service GCS and S3 for the destination file or directory.

Output

Output

Type

Description

out

message

A message which copies the input once the operation is successful.