Show TOC

Background documentationDistribute Workload Locate this document in the navigation structure

 

Queues distribute work across a larger system. If one or more machines can perform a task, then queues can be used to distribute work among these machines by attaching a process server to each machine, and a queue to these process servers. Jobs are automatically distributed to all the process servers as they start.

This graphic is explained in the accompanying text.

The following figure shows a queue and two process servers working normally. The jobs are evenly allocated to both servers.

This scenario also provides for fail over in both planned and unplanned scenarios.

In a planned scenario (for example, an upgrade of the operating system is required), the process server is first shut down gracefully: all running jobs are allowed to complete, but no new jobs start. Jobs now automatically start on one of the other process servers until the planned work is completed and the process server restarted.

This graphic is explained in the accompanying text.

The following figure illustrates a planned down-time scenario. All jobs are allocated to one server.

In an unplanned scenario, the process server is unreachable. This can be due to network failure, or a hardware failure.

This graphic is explained in the accompanying text.

The following figure illustrates an unplanned, network-related down-time scenario. All jobs are allocated to one server, the connection to the second server is interrupted.

In the case of a network failure, jobs continue to execute, and complete on the machine. The process server attempts to notify the central system that the job has completed. It continues to retry this operation if it fails.

This graphic is explained in the accompanying text.

The following figure illustrates an unplanned, system-related down-time scenario. All jobs are allocated to one server, the second server is unavailable.

In the case of a machine failure, jobs on the machine are set to the UNKNOWN status to indicate that their results may not be reliable (since the machine failed).