Modeling Guide for SAP Data Hub

Amazon S3

AWS S3 is an Object Store service, further documented in the owner's page. Other services may also support the S3 API, such as Rook, Minio and Swift, which have already been tested. Any other service supporting the S3 API is not guaranteed to be compatible.

This aims to document only the operators' relation with the S3 API.

This document may refer to an object as a "file", and to an object's prefix as a "directory", if it fits the context of the operator.

Connection

In order to use any operator that connects to S3, you may use a Connection ID from the Connection Management, or set a Manual connection with the following values:
  • Custom endpoint
    Allows using a custom endpoint to access the S3 service. If not set, the default AWS endpoint is used.
    • ID: endpoint
    • Type: string
    • Default: ""
  • Protocol [Mandatory]
    Sets which protocol to be used. The set value overwrites the protocol prefix in the Custom endpoint configuration, if any given.
    • ID: protocol
    • Type: string
    • Default: "HTTP"
    • Possible values:
      • "HTTP"
      • "HTTPS"
  • Region [Mandatory]
    The AWS region the configured bucket (found in Root Path) belongs to.
    • ID: region
    • Type: string
    • Default: "eu-central-1"
  • Access Key [Mandatory]
    The Access Key ID used to authenticate to the service. It pairs with the Secret Key in order to authenticate.
    • ID: accessKey
    • Type: string
    • Default: "AWSAccessKeyID"
  • Secret Key [Mandatory]
    The Secret Access Key used to authenticate to the service. It pairs with the Access Key in order to authenticate.
    • ID: secretKey
    • Type: string
    • Default: "AWSSecretAccessKey"
  • Root Path
    The bucket and an optional root path name for browsing. Starts with a slash and the bucket name (e.g. /MyBucket/My Folder), followed by another slash and the optional root path. Dataset names for this connection don't contain segments of the rootPath; instead their first segment is a subdirectory of the root path.
    • ID: rootPath
    • Type: string
    • Default: "/MyBucket/MyFolder"
Further connection configurations may be set, which are not in the Connection Management. Such are:
  • Bucket
    Optional bucket name to be accessed. It works as a "fallback" of the Connection's Root Path configuration. For instance, if no bucket is given in the Root Path, the value from Bucket is used.
    • ID: awsBucket

    • Type: string

    • Default: "com.sap.datahub.test"

  • Proxy
    An option proxy to be used in the connection to the service.
    • ID: awsProxy

      Type: string

      Default: ""

  • Use SSL
    Whether to use SSL/TLS when connecting to the service.
    • ID: useSSL

      Type: boolean

      Default: true

Permissions

Permissions in AWS are required to operate over S3 objects. Each operator may require a determined set to successfully operate.

Read File Permissions

To read a single object ("file"), you need the permissions:
To read multiple objects in a prefix ("directory"), you need the permission:
  • s3:ListBucket for the bucket where the prefix is to be listed. Note that the permission may be narrowed to a directory inside the bucket, and the prefix is subject to this restriction. See also, AWS S3 GET BucketInformation published on non-SAP site.
If Delete After Send is being used, you also need the permission:

Write File Permissions

To write an object ("file"), you need the permission:
If using mode "Append", you also need:
  • s3:GetObject for the given object. See also, AWS S3 GET ObjectInformation published on non-SAP site. This is due to the restrictions documented further.

Remove File Permissions

To remove an object ("file"), you need the permission:

Move File Permissions

As moving consists of copying and removing in S3, you will need the permissions documented in Remove File Permissions and Copy File Permissions.

Copy File Permissions

To copy an object ("file"), you need the permissions:
  • s3:GetObject for the source object.

  • s3:PutObject for the bucket to receive the copied object. See also, AWS S3 Multipart Upload API and PermissionsInformation published on non-SAP site.

    If copying by prefix (i.e. a "directory"), the operation is bound to the same permissions documented in Read File Permissions.

Restrictions

Any S3 specific restriction in the operators is documented here. Some may apply broadly to every operator:
  • Directories:

    In order for a path to be interpreted as a directory, it should end with /. For example: /tmp/ is a directory, while /tmp is a file named tmp.

  • Working directory:

    Since there is no concept of a "working directory", any relative directory given to/by this service will have the root directory (/) as working directory.

Write File Restrictions

If using "Append" mode, as the S3 API does not support it, the whole file is retrieved from the service in order to append the data and write back to S3; thus, compromising the operation's efficiency.

Move File Restrictions

As the S3 API does not support the move operation, the operation consists of a copy followed by removing the source file. Thus, in cases of failure, the file may be copied and not removed.

Further restrictions are documented in Copy File Restrictions.

Copy File Restrictions

Taking that the operation has a "source" and a "destination" path:
  • If the destination is a file, source must also be a file.

  • If the destination is a directory, it must be empty.

For instance, in the given file structure:
.
|
+-- a
|   +-- file1.txt
|   +-- file2.txt
+-- b
    +-- f1.txt
    +-- f2.txt
  • Copying source: a/file1.txt to destination: newfile.txt, would succeed, since the destination does not exist.

  • Copying source: a/file1.txt to destination: b/f1.txt, would succeed and overwrite b/f1.txt, since the destination is an existing file.

  • Copying source: a/file1.txt to destination: b/, would fail, since b/ already exists and is not empty.

  • Copying source: a/ to destination: b/ would fail, since b/ already exists and is not empty.

  • Copying source: a/ to destination: b/dir/ would succeed, since b/dir/ does not exist.