Modeling Guide

Google Pub/Sub Producer

This operator receives a message from the input port and publishes it to a Google Pub/Sub topic.

  • The provided Topic is the default topic that received messages will be published to. Alternatively, if a gcp.pubsub.topicName attribute exists in the message, the value of this attribute is used as topic name and the message is published to this topic rather than Topic. This attribute will not be removed by the producer upon publish. If Topic is empty and the producer cannot get the topic name from the message attributes, the producer fails regardless of the value of Fail on error.
  • Create topic if it does not exist applies only to Topic, not topic names mentioned in the message attributes.
  • If Fail on error is false, upon error, the producer only outputs the error message on Error output port and does not fail the graph. The message to be published is discarded.
  • Subscription name to create applies only to Topic and not topic names mentioned in the message attributes. If specified, the producer makes sure the subscription to Topic exists (if not, it creates the subscription) on Google Pub/Sub before processing input messages. Creating a subscription before publishing can ensure publications published to Topic are stored on Google Pub/Sub, even if at the time of publishing, no subscription to Topic exists. Google Pub/Sub does not guarantee storing undelivered publications when there are on matching subscriptions.
  • If the number of publications received by the producer, but not yet published to the Google Pub/Sub service is more than Maximum outstanding publications, the producer does not process messages from messageToPublish input until one or more messages are finished publishing. Publishing a message is finished when the producer outputs its message ID on the publishedMessage output port, or an error occurs and the error is written to the error output port.
  • Upon publish timeout, an error message is written to the Error output port and if Fail on error is true, the graph fails.
  • The encoding of the received input message is stored in an attribute named message.encoding in the message published to the Google Pub/Sub service. The Google Pub/Sub consumer uses this attribute to reconstruct the same DataHub message.
  • As Google Pub/Sub supports only string as key/value type of attributes, any attribute included in the input message to this operator must have strings as key and value.
  • In order to improve the throughput of the operator, publications are batched. The value of Publication batch size must be smaller than Maximum outstanding publications. This is checked by the the producer upon initialization.

Configuration Parameters

Parameter Type Description
Connection object Mandatory. A Google Pub/Sub connection consisting of a GCP project ID and a JSON key file to access the Pub/Sub service.
Topic string Google Pub/Sub topic name.
Create topic if it does not exist boolean Whether to create Topic if it does not exist on the given Google Pub/Sub project.

Default: true

Fail on error boolean Whether to fail the whole graph if the producer encounters an error at runtime.

Default: false

Subscription name to create string Ensure that the given subscription name exists for Topic.
Maximum outstanding publications integer Maximum number of publications that are submitted to the operator but are not finished being published.

Default: 1000

Publish timeout (seconds) integer Maximum timeout in seconds for publishing a publication.

Default: 60

Publication batch size integer Minimum number of publications that the producer needs to receive before sending a batch of publications to the Google Pub/Sub service.

Default: 100

Delay between retrying failed publications (milliseconds) integer Time to wait before retrying a failed publication.

Default: 1000

Number of retry attempts for failed publications integer Maximum number of retries for a failed publication. Upon reaching this limit, an error is generated and handled according to the value of Fail on error.

Default: 5

Input

Input Type Description
messageToPublish message The stream of DataHub messages to publish.

Output

Output Type Description
publishedMessage message Upon successful publication, Google Pub/Sub assigns a unique ID to each publication. The producer outputs these IDs as a steam of messages on this port, with the message ID as value of the attribute msg.MessageID.
error message Error messages occurring during runtime are output on this port as a stream of message (one for each error). The error message has the attribute msg.KeyError with the value true and for errors relating to publishing a message, the original message can be retrieved using the attribute originalMessage. The body of the message is the error message.