Modeling Guide

Livy Spark Submit

The Livy Spark Submitter operator submits jobs to a cluster using the Livy REST API. It has 2 different modes: jar and snippet. Once a mode is chosen, the configuration tab will show only relevant configuration parameters.

In jar mode, you can submit an application in a way that is very similar to using spark-submit. The operator will succeed only if the underlying job is finished successfully. For instance, if a jar file is submitted to YARN, the operator status will be identical to the application status in YARN. Note that the jar file must be accessible to Livy.

In snippet mode, code snippets could be sent to a Livy session and results will be returned to the output port. This approach is very similar to using the Spark shell. Note that there are some limitations in adding jars to sessions due to LIVY-327.

As opposed to jar mode, the operator will not fail even if a code snippet sent to Livy has failed. You can change this default behavior by setting strictCodeExecutionMode to true in the configuration tab.
  • In strict mode, the operator verifies if the output of the code snippet execution contains any errors. If any errors are found, the operator will also fail.

    In non-strict mode, the operator will ignore possible errors that happen during the snippet execution. As a consequence, users have to analyze the result manually to see if the execution is successful. This can be done by exploring the job execution output, which is sent to the output port of the operator.

Configuration Parameters

Parameter

Type

Description

livyEndpoint

string

Mandatory. Defines the Livy endpoint to use (please, also specify the port number). If the Livy service cannot be reached - the operator will fail during the initialization phase.

Default: "http://livy-api-endpoint.com:8998"

sourceType

string

Mandatory. Defines the type of the job that is being submitted: "jar" for using a jar-file, "snippet" - for a snippet of code.

Default: "jar"

errorHandlingMode

string

Mandatory. Defines the error handling mode:
  • in "default" mode, any errors from the Livy operator are immediately returned to the execution engine. As a consequence, the whole graph is failed and the remaining part of your pipeline is not executed.

  • in "pipeline" mode, all errors are streamed to the error port and the execution continues. This mode allows you to have different pipelines depending on the status of your Spark job submitted by Livy.

Default: "default"

securityContext

string

Defines the security context to use. This parameter must be set to communicate with a secure Livy endpoint.

proxyUser

string

User to impersonate when starting a session or running a job. It's also used as a value of the "X-Requested-By" header in HTTP requests to avoid being blocked by the CSRF protection. If this value is empty, "X-Requested-By" will be equal to "hdfs".

Default: "hdfs"

jars

string

Jars to be used in this session or batch.

Default: "jar1,jar2,..."

conf

string

The value for the "--conf" argument of spark-submit. You must input configurations exactly according to the json-format, and wrap it with curly braces.

Default: "{"key1":"value1","key2":"value2"}"

accessToken

string

OAuth access token.

Default: ""

tlsRootCACert

string

The root certificate of CA. This parameter is useful if you are using a proprietary CA to sign the server certificate for Livy.

Default: ""

tlsSkipVerify

bool

If set to true, the certificate validation is disabled.

Default: "false"

batchName

(For jar mode only)

string

The name of the batch.

Default: "default batch name"

jarPath

(For jar mode only)

string

Mandatory. Path to jar to be submitted.

Default: "hdfs://path-to-jar"

className

(For jar mode only)

string

Mandatory. Name of the class to be executed in jar.

Default: "org.com.smth.className"

args

(For jar mode only)

string

Command line arguments for the application.

Default: "arg1,arg2,..."

sessionName

(For snippet mode only)

string

The name of this session.

Default: "default session name"

snippet

(For snippet mode only)

string

Mandatory. Snippet of code that has to be submitted.

Default: "snippet of code"

snippetType

(For snippet mode only)

string

Mandatory. Defines the kind of session that should be created for snippet execution (language of snippet). Possible values: spark, pyspark, pyspark3 or sparkr.

Default: "spark"

strictSnippetExecutionMode

(For snippet mode only)

bool

Mandatory. Defines whether operator is tolerant to errors in snippet execution output, i.e. switches strict and not strict modes of snippet submitting.

Default: "false"

Input

Input

Type

Description

inport

string Accepts path to jar or snippet of code that has to be submitted. Input signal initiates job submitting.

Output

Output

Type

Description

out

string The following information will be sent to this port if the submitted job finishes successfully:
  • sourceType jar: the corresponding Livy log

  • sourceType snippet: output of the executed snippet

error

string

If the submitted job fails and the Livy operator is using pipeline error handling mode, the error will be routed to this port.