Modeling Guide for SAP Data Hub

WebHDFS

WebHDFS supports Hadoop Distributed File System through the REST API. It is one of the protocols of Apache's distributed storage solution. For more information, see the official WebHDFS home page.

Many of the SAP Data Hub storage operators offer support for ADL. This documentation covers the common characteristics that this service has across operators.

Connection

In order to use any operator that connects to WebHDFS, you may use a Connection ID from the Connection Management, or set a Manual connection with the following values:
  • HDFS Host [Mandatory]
    The IP address to the Hadoop name node.
    • ID: host
    • Type: string
    • Default: "localhost"
  • HDFS Port
    he port to the Hadoop name node. If not informed, will use the protocol's default port.
    • ID: port
    • Type: string
    • Default: "50070"
  • Protocol
    The protocol to be used. The WebHDFS service supports the webhdfs and webhdfs protocols. To use the rpc protocol, the HDFS service must be chosen in the configurations.
    • ID: protocol
    • Type: string
    • Default: "rpc"
  • User
    The Hadoop user name.
    • ID: user

    • Type: string

    • Default: "hdfs"

  • Root Path
    The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder).
    • ID: rootPath

    • Type: string

    • Default: ""

Further connection configurations may be set, which are not in the Connection Management. Such are:
  • Token
    The Token to authenticate to WebHDFS with.
    • ID: webhdfsToken

    • Type: string

    • Default: ""

  • OAuth Token
    The OAuth Token to authenticate to WebHDFS with.
    • ID: webhdfsOAuthToken

    • Type: string

    • Default: ""

  • Do As
    The user to impersonate. Has to be used together with User.
    • ID: webhdfsDoAs

    • Type: string

    • Default: ""

Permissions

The WebHDFS PermissionsInformation published on non-SAP site for files and directories are based on the POSIX model, that is, for each file or directory there are W, R and X permissions that may be attributed separately to the owner, the group associated with the file/directory and the group of remaining users.

For a finer control, it is possible to define an Access Control ListInformation published on non-SAP site, which allows the definition of specific rules for each user or each group of users.

Read File Permissions

To read a file, you need W and R permissions on the file.

Write File Permissions

To write a new file, you need W permission on the directory where the file will be created.

To append or overwrite an existing file, you need W permission on the file.

Remove File Permissions

To remove a file or directory, you need W and R permissions on the corresponding file/directory.

Move File Permissions

  • Moving a File:

    To move a file you need W permission on the original file and R permission on the original directory.

    If the destination file already exists and is being overwritten, you need W permission on the destination file.

    On the other hand, if the file does not exist on the destination, you need W permission on the destination directory.

  • Moving a Directory:

    To move a directory, you need W and R permissions on the original directory and W permission on every file within it.

    On the destination folder, you need W permission.

    If any of the files already exist at the destination and needs to be overwritten, you need the W permission on the destination file as well.

Restrictions

Any WebHDFS specific restriction in the operators is documented here. Some may apply broadly to every storage operator:
  • Working directory:

    Since there is no concept of a "working directory", any relative directory given to/by this service will have the root directory (/) as working directory.

Copy File Restrictions

Since the WebHDFS API does not support the copy operation, this behavior can be achieved through Read + Write.