Spark Submit
The Spark Submit operator is a wrapper for spark-submit.
It requires a Spark installation, and SPARK_HOME must point to it. If YARN is used (using the master parameter), then HADOOP_CONF_DIR must point to a directory with core-site.xml and yarn-side.xml that contain the correct settings to connect to the Hadoop cluster with YARN.
If SAP Vora Pipeline Engine is running in cluster mode, it uses a Docker image that provides the necessary environment. The configuration for YARN is retrieved from the yarn.resourcemanager.address, yarn.resourcemanager.hostname, and fs.defaultFS parameters. Because the appjar parameter is relative to the environment, it must be a path inside the Docker container; otherwise, use the binary input to stream in a JAR.
Configuration Parameters
Parameter |
Type |
Description |
---|---|---|
master |
string |
The value for the "--master" argument of spark-submit. Default: "yarn" |
deploymode |
string |
The value for the "--deploy-mode" argument of spark-submit. Default: "cluster" |
class |
string |
Mandatory. The value for the "--class" argument of spark-submit. Default: "org.apache.spark.examples.SparkPi" |
appjar |
string |
Mandatory. The path to the JAR to be executed. Optionally, the &workingDirectory& variable can be used, which expands to the path of the current operator directory in the repository (usage: workingDirectory/my_app.jar"). Default: "/usr/local/spark/examples/spark-examples_2.10-1.1.1.jar" |
jars |
string |
JARs that are added to spark-submit via the "--jars" argument. Like the appjar path, the &workingDirectory& variable is also available. Default: "" |
packages |
string |
Packages that are added to spark-submit via the "--packages" argument. Default: "" |
conf |
string |
The value for the "--conf" argument of spark-submit. Each "key=value" pair is separated by a newline. Default: "" |
impersonateUser |
string |
The user name used to access YARN and HDFS and the value for the "--proxy-user" argument of spark-submit if the cluster is kerberized . Default: "vora" |
shutdownOnFailure |
boolean |
Specify true if the component should exit when a single error occurs; otherwise, specify as false. Default: false |
secContext |
string |
The security-context to be used to connect to the Hadoop system. Default: "default" |
Input
Input |
Type |
Description |
---|---|---|
args |
string |
A string of arguments that are parsed to the JAR that is executed. |
binary |
blob |
An application JAR that is executed. If this input is connected, appjar is ignored. |
Output
Output |
Type |
Description |
---|---|---|
success |
string |
On success of one Spark job, it returns the corresponding argument parsed in via args, and "done" is appended. |
failure |
string |
On failure of one Spark job, it returns the corresponding argument parsed in via args, and "error" is appended. |