Modeling Guide

R Training Pipeline (Iris Dataset)

This graph trains a flower classifier with R on the Iris Dataset.

The training part of the Iris DatasetInformation published on non-SAP site is loaded with the Blob Consumer Operator. The dataset is then converted to string to be processed by a JavaScript operator called Chunk Data. The code of this operator can be viewed by right clicking on it and then in open script.

This operator will separate each line of the incoming dataset in different data packets. The two cases where we use the multiplexer operator in the graph are to peek into the flow with the terminal operator. So from now on we are going to ignore the multiplexers on the explanation. So after Chunk Data the data flow goes to the Prune Faulty Lines javascript operator where the faulty line are blocked in the flow. Then, the data is accumulated again in the Create Batch operator, by inserting new line character between the received strings. So the resulting batch is a string where each line represents a data set example with features of flower followed by its correct class. So in the RClient operator, this batch is fed into a KNN classifier which is trained using those examples. The resulting model is then serialized and sent to the model producer which will save the model in the file system.