Managing Data States Across Actions

Guaranteeing the accuracy of data being published to target systems requires an understanding of the Data Hub approach to data processing. Special handling may be required based on the splitting of data across load and composition actions. The ability of the target system to manage intermediate states is also important.

Data processing in Data Hub is fundamentally batch based. Each stage of processing may result in batches being split in different ways, across separate load and composition actions, and different feeds and pools. Data from various parts of this process can be combined to produce the end result. Here we outline some of the constraints and requirements that must be met to guarantee this correctness.

Item Splitting

Raw items are grouped together by data loading action. All raw items received by the input channel in a single request get grouped into a single data loading action. With integrations based on IDocs, each IDoc, or batch of IDocs, gets grouped into its own data loading action. Data Hub guarantees that these raw items are not spit into separate composition actions unless the number of items exceeds the configured maximum composition action size.

Three Raw Items in a Single Data Loading Action are Grouped and Composed into a Single Canonical Item

Action Grouping

Data loading actions are not guaranteed to be processed independently. An updated version of a raw item can arrive and then be processed in the same composition action, resulting in a single canonical item.

An Updated Raw Item is Loaded and Composed in a Single Composition Action

But this updated version of a raw item could also arrive and be processed in a separate composition action. The processing of separate data loading actions must therefore be associative to ensure they result in the same canonical item.

An Updated Raw Item is Loaded in a Separate Loading Action and Merged to Produce the Final State

Transformation operations must be associative. You can see it in the case of multiple data loading actions contributing to the same canonical item. If raw items do not arrive in the same data loading action, they might be split during compositions. They can be split by processing batches in a single action, or they can be split across different actions. Account for the splitting in the modeling approach.

The following figures illustrate three separate raw items contributing to a single canonical item. The associative nature of the composition operation ensures that the final state is equivalent, regardless of how the loading actions are split across composition actions.

A Single Canonical Item Results From One Composition Action
Two Composition Actions Result in an Intermediate State (1)
Two Composition Actions Result in an Intermediate State (2)
Three Composition Actions Result in Two Possible Intermediate States

Handling Intermediate States

This arbitrary splitting naturally leads to the possibility of publishing intermediate states to the target system. As soon as a composition action is complete, the created items are available for publication. In the preceding examples, attempts might be made to publish the intermediate states to the target system. Account for canonical item integrity in the modeling, or otherwise handled by the target system.

There are several possible cases to consider, based on the nature of the data and the behavior of the target system.

  1. The target system rejects the intermediate state. Data Hub then attempts to republish the data when the completed state has been reached, and would be accepted by the target system.
  2. The target system accepts the intermediate state. Data Hub then later sends the completed state, thus resulting in an eventually consistent final state in the target system.
  3. The target system accepts the intermediate state, but this results in a failure when Data Hub sends the complete state. Publishing a new iDoc with the same identifier to SAP ERP might create this case, for example.

For case 3, or case 2, if this intermediate inconsistent state is unacceptable for the given data, then special handling is required. You might solve the issue by implementing a custom publication grouping handler. If the custom handler is in its completed state, it would prevent the publication of an item.

Ordering

The ordering of raw items of a particular type within a single composition is guaranteed, based on the order that they are received by Data Hub. However, the ordering of canonical item creation from different composition groups is not guaranteed. There is no guarantee because composition is a highly concurrent process. The expectation is that for a given set of input raw data, each composition group at the end of the grouping process should become a unique canonical item (with a unique integration key).

Multiple raw items in different composition actions become a different version of the same canonical item (with the same integration key), the composition process merges these into a single item. The order in which this merging takes place is nondeterministic, which could be a problem in most cases. The merge operation is generally not commutative, so changing the order in which items are merge may change the final state. Therefore, you should ensure correct grouping of raw items to guarantee that merges take place in the correct order, and so avoid this potential problem.