Modeling Guide for SAP Data Hub

Hadoop File System (HDFS)

Hadoop Distributed File System is Apache's distributed storage solution. For more information, see the official HDFS documentation.

Many of the SAP Data Hub storage operators offer support for HDFS. This documentation covers the common characteristics that this service has across operators.

Connection

In order to use any operator that connects to HDFS, you may use a Connection ID from the Connection Management, or set a Manual connection with the following values:
  • HDFS Host [Mandatory]
    The IP address to the Hadoop name node.
    • ID: host

    • Type: string

    • Default: "127.0.0.1"

  • HDFS Port
    The port to the Hadoop name node. If not informed, will use the protocol's default port.
    • ID: port

    • Type: string

    • Default: "8020"

  • Protocol
    The protocol to be used. It will not work properly if set differently than rpc when the HDFS service is selected. In order to use the webhdfs or swebhdfs protocols, select the WebHDFS service in the configurations and it will allow the use of these protocols.
    • ID: protocol

    • Type: string

    • Default: "rpc"

  • User
    The Hadoop username.
    • ID: user

    • Type: string

    • Default: "hdfs"

  • Root Path
    The optional root path name for browsing. Starts with a slash (e.g. /MyFolder/MySubfolder).
    • ID: rootPath

    • Type: string

    • Default: ""

Permissions

The HDFS PermissionsInformation published on non-SAP site for files and directories are based on the POSIX model, that is, for each file or directory there are W, R and X permissions that may be attributed separately to the owner, the group associated with the file/directory and the group of remaining users.

For a finer control, it is possible to define an Access Control ListInformation published on non-SAP site, which allows the definition of specific rules for each user or each group of users.

Read File Permissions

To read a file, you need W and R permissions on the file.

Write File Permissions

To write a new file, you need W permission on the directory where the file will be created.

To append or overwrite an existing file, you need W permission on the file.

Remove File Permissions

To remove a file or directory, you need W and R permissions on the corresponding file/directory.

Move File Permissions

  • Moving a File:

    To move a file you need W permission on the original file and R permission on the original directory.

    If the destination file already exists and is being overwritten, you need W permission on the destination file.

    On the other hand, if the file does not exist on the destination, you need W permission on the destination directory.

  • Moving a Directory:

    To move a directory, you need W and R permissions on the original directory and W permission on every file within it.

    On the destination folder, you need W permission.

    If any of the files already exist at the destination and needs to be overwritten, you need the W permission on the destination file as well.

Restrictions

Any HDFS specific restriction in the operators is documented here. Some may apply broadly to every storage operator:
  • Working directory:

    Since there is no concept of a "working directory", any relative directory given to/by this service will have the root directory (/) as working directory.

Copy File Restrictions

Since the HDFS API does not support the copy operation, this behavior can be achieved through Read + Write.