Modeling Guide for SAP Data Hub

Basic Validation

The Validation Rule operator evaluates input data against a rule. That is, a logical expression checking for particular values, null, not null, and so on. The user may define one or more rules and give each rule a meaningful name.

Input data that evaluates to true are sent to the 'pass' port, and failing records are similarly sent to the 'fail' port. There is a 'failInformation' port that shows the rules that failed and a unique ID for those records. The 'outError' port receives messages to help diagnose problems, such as corrupt input stream characters.

Configure and Run the Graph

Follow the steps below to run the example from the Data Pipeline UI:
  1. Click the Validation Rule Demo icon to load the sample graph. Notice the Validation Rule operator has four output ports (described above) attached to four Terminal operators.
  2. In the toolbar, select Run (play button). The Status panel indicates whether the graph is running.
  3. To view the passing records, click the Pass Output terminal, and then select Open UI.
  4. To view the failing records, click the Fail Output terminal, and then select Open UI.
  5. To view the failure information, click the Fail Info Output terminal, and then select Open UI.
  6. To view any error messages, click the Error Output terminal, and then select Open UI. For this example graph, there will be no errors.

Rules and Substitution Values Explained

The Validation Rule operator in this example is configured not only to evaluate the rule, but also to update a column with a substitution value in the event a rule failure. This section explains how to use a rule and an associated substitution value in more detail.
  1. The input data generated by the Data Generator has four columns: ID, Name, Title, and Salary. The Name, Title, and Salary values are randomly selected, and the Title field has the possibility of having a null value.
  2. Click the Validation Rule operator and select Open Configuration. The configuration values are shown in the right pane.
  3. Click on the Rules * option. The rule "Title is NOT NULL" is shown.
    • By default, this means that records with the Title value of NOT NULL are sent to the 'pass' port, and records with a Title value of null, are sent to the 'fail' port. This behavior is controlled by the 'Fail Action' setting, which has a FAIL default value.

  4. But for this example, notice that 'Fail Action' is set to BOTH. This instructs the Validation Rule operator to send records that fail the rule to both the 'pass' port and ALSO the 'fail' port. Why would you do this?
    • You can provide a substitution value to update or correct the failed records.
    • The substitution value "fixes" the aspects of a record that failed the rule and all the records are sent to one port. The records are output in the same order from the input source, even for the records that have the substitution value included.
  5. Click the Substitutions option. The substitution "Title - 'Intern'" is shown.
    • Because the substitution is defined as 'Intern', any records that fail AND are sent to the 'pass' port have the Title column updated with the value of 'Intern'. Those records that go the 'fail' port do not have the Title column updated, and remain null.

Input data (ID, Name, Title, Salary), 3 records

  • Notice that record 2 has a null Title value.
    1,Ben,Software Engineer,60543.45
    2,Drew,,0.0
    3,Glenn,CEO,123054.34
'pass' port:
  • Receives all input records (passing and failing records).

  • Notice the failed record (#2) received the substitution value of 'Intern' for the Title column.

  • Substitution action only occurs to records that fail a rule AND are sent to the 'pass' port.
    1,Ben,Software Engineer,60543.45
    2,Drew,Intern,0.0
    3,Glenn,CEO,123054.34
'fail' port:
  • Receives the one record that failed the rule.

  • Notice the 'B' indicating record was sent to BOTH the 'pass' and 'fail' ports.

  • Failed rules are shown with a unique identifier.
    2,Drew,,0.0,B,Failed rules(s):  Title,1.0
'failInformation' port:
  • Receives information on failed records: unique ID, rule name, and column name.
    1.0,Rule_1,Title