Modeling Guide

Copy File

The Copy File operator is used to copy files in a storage service. The expected input message is specified in the Input section in this document.

This operation is recursive, meaning it will copy any file/dir under the given source path.

If destination is a file, it will be overwritten; if it is a non-empty directory, it will fail.

  • Copying source: a/file1.txt to destination: newfile.txt, would succeed, since the destination does not exist.

  • Copying source: a/file1.txt to destination: b/f1.txt, would succeed and overwrite b/f1.txt, since the destination is an existing file.

  • Copying source: a/file1.txt to destination: b/, would fail, since b/ already exists and is not empty.

  • Copying source: a/ to destination :b/ would fail, since b/ already exists and is not empty.

  • Copying source: a/ to destination: b/dir/ would succeed, since b/dir/ does not exist.

Supported services are:
  • Google Cloud Storage (GCS)

  • Amazon S3

  • Azure Storage Blob (WASB)

Configuration Parameters

Parameter

Type

Description

service

string

The file service to operate. Additional parameters may depend on the selected service.

Default: "GCS"

timeoutInMs

int

Sets the time limit to execute the operation. If `0`, no timeout is used.

Default: 0

retryPeriodInMs

int

The time interval in milliseconds between connection trials.

Default: 0

numRetryAttempts

int

The number of times to retry a connection.

Default: 0

simultaneousRequests

int

The number of simultaneous requests generated on recursive calls (only available for GCS, S3 and WASB).

Default: 1

stopRequestOnError

boolean

Sets if simultaneous requests from recursive calls should stop at first error (only available for GCS, S3 and WASB).

Default: false

authKey

string

GCS parameters: Mandatory. Service account json key.

Default: ""

projectID

string

GCS parameters: Mandatory. The ID of project that will be used.

Default: "projectID"

bucket

string

GCS parameters: Name of bucket where files are.

Default: "bucket-name"

aws\_access\_key\_id

string

S3 parameter: Mandatory. The AWS access key ID.

Default: "AWSAccessKeyId"

aws\_secret\_access\_key

string

S3 parameter: Mandatory. The AWS secret access key.

Default: "AWSSecretAccessKey"

region

string

S3 parameter: Mandatory. The AWS region to create the bucket in.

Default: "eu-central-1"

bucket

string

S3 parameter: Mandatory. The S3 bucket.

Default: "testBucket"

accountName

string

WASB parameter: Mandatory. The account name from WASB.

Default: ""

accountKey

string

WASB parameter: Mandatory. The account key from WASB.

Default: ""

containerName

string

WASB parameter: Mandatory. The container name from WASB.

Default: ""

useHTTPS

boolean

WASB parameter: If it is to use HTTPS or HTTP on the connection to the WASB service.

Default: true

Input

Input

Type

Description

in

message

A message whose body has the source file path and destination. Expected message attributes:
storage.src
Mandatory. Path to source file or directory.
storage.dst
Mandatory. Path to destination file or directory (should match types with source). The destination path may contain <dirname> and <basename>, referring to the given source path's dirname and basename. E.g: move src:data/A/user.csv dst:<dirname>/processedFiles/<basename> would map to the destination data/A/processedFiles/user.csv.
storage.srcBucket
Overrides bucket from configuration on service GCS and S3 for the source file or directory.
storage.dstBucket
Overrides bucket from configuration on service GCS and S3 for the destination file or directory.

Output

Output

Type

Description

out

message

A message which copies the input once the operation is successful.