Replication and Failover

The Replication Process

Lock Table and Replication Table

For each change made to the lock table (for example, if the enqueue server receives an enqueue or dequeue request), the line changed in the lock table is passed to the replication server, which adds the line to the replication table. The response from the actual enqueue request is only sent to the enqueue client once the replication is successfully completed.

State Transfer

Before you can record the changes to the lock table, the replication table must be updated. A state transfer is used to update the table. The enqueue server sends the entire current lock table to the replication server. Only then can the lock table changes be transferred as delta information.

If an error occurs while this delta information is being replicated, (for example, a network failure), the replica will be out of date and must be updated (the delta information can only be sent once). Only the lock table lines that are not yet up-to-date on the replication server are sent to the replication server (differential state transfer). The entire content of the lock table is not sent.

Errors may not only occur when delta information is transferred, they can also occur during state transfers (complete or differential). If an error does occur, a state transfer must be executed again.

This could also fail. If it fails three times, the replication is stopped (replication state suspended). After a defined period of time (enque/enrep/stop_timeout_s, the default is 300 seconds), this state is exited and three more attempts are made to update the replica. If this fails again, the replication is again set to suspended. After a further five minutes another three attempts are made to execute the state transfer. This is repeated as many times as specified by the value in enque/enrep/stop_retries. After this the replication server is closed down and must be restarted (for example, by the HA software). This procedure makes it clear externally that there is a problem with the replication.

Failover

The enqueue server must “obey” the replication server, which means that if there is a failover, the enqueue server must be restarted on the host on which the replication server is running, since this host contains the replication table in a shared memory segment. The restarted enqueue server needs this segment to generate the new lock table. Afterwards this shared memory segment is deleted.

With some HA solutions this concept cannot be used trouble-free. For more information see Polling Concept.

Problems

The following problems can occur in the high availability environment with the SAP replication concept:

The lock table is generated from the replication table, as described above, even if the replication server is no longer running. Since shared memory segments get deleted under Windows as soon as they no longer have a process attached to them, the replication server must still be running when the restarted enqueue server connects to the shared memory segment. If the enqueue server has read the replication table, a flag is set in the shared memory to notify the replication server. The replication server then shuts down, the HA software does not have to do anything here.
Some HA software solutions do not offer the concept that a software package obeys another package on this package’s host. For this reason SAP has introduced the Polling Concept: Here, a replication server runs on every possible failover host, and does not have to be controlled by the HA software. Whether you need this or not depends on your HA software. For more information, contact your HA software partner.

Configuring Polling

If you require polling, activate it with parameter enque/enrep/poll_interval.

More Information

Profile Parameters for the Standalone Enqueue Server

Profile Parameters for the Enqueue Replication Server

Polling Concept