Modeling Guide for SAP Data Hub

R Inference Pipeline (Iris Dataset) (Beta)

This graph predicts flower classes using a specified trained R classifier.

The test part of the Iris Data Set is loaded with the Blob Consumer Operator. Then the data set is converted to string to be processed by an java script operator called Chunk Data. The code of this operator can be viewed by right clicking on it and then selecting Open Script. This operator will separate each line of the incoming data set in different data packets. The two cases where we use the multiplexer operator in the graph are to peek into the flow with the terminal operator. So from now on we are going to ignore the multiplexers on the explanation

So after Chunk Data the dataflow goes to the Prune Faulty Lines java script operator where the faulty examples are skipped in the flow and the remaining strings are accumulated/concatenated by inserting a new line character between them, and thus forming a batch.

The RClient operator will wait for the serialized trained KNN model on its upper input port before it starts accepting data in its bottom input port. After the KNN model is received, the RClient will predict the classes of the incoming examples in the batch received in its dataset import. Then the confusion matrix will be calculated using the ground truth of those examples and it will be sent to the output port as a string. Lastly, this string will be normalized by a JavaScript operator and then sent to the Dashboard UI operator from which the viewer can be opened by right clicking on the operator and then selecting Open UI.

You can refer to the R Training Pipeline graph to see how one can train and save a model to be used by the R Inference Pipeline graph.