Model State for Real-time Scoring • hana.ml.r

Introduction

When a model is applied in scoring functions(e.g. predict(), score(), transform()) in hana.ml.r package (also in SAP HANA PAL), two steps are performed sequentially: firstly the model is parsed, secondly the parsed model is applied to the scoring data. This is the common way that scoring functions works in SAP HANA PAL, and usually it won’t be an issue. However, for scenarios where the trained model is complex enough while the model is called by scoring functions multiple times, parsing the model shall take too much time compared with the total execution time of scoring functions. In such scenarios, it is definitely unwise to parse the model over and over again.

In hana.ml.r, the functionality for running the two sequential steps in scoring functions separately is provided, which is based on the mechanism implemented in SAP HANA PAL. Basically, model parsing(de-serializing the model content from the database table and converting it into model object) could be executed alone only once, and then the parsed model could be kept and repeatedly applied to any incoming scoring data.

In hana.ml.r, there are a family of classes that support the splitting of model parsing and scoring execution, where the following public/S3 methods get involved:

CreateModelState(): Should be called in the presence of a trained model, the model content is read from the database tables, parsed, and finally kept in container called state. Identifier of the model state is stored in a database table and is assigned to the state attribute of the class object for reference.
scoring functions(e.g. predict(), score(), transform()) : After a model state has been created, we can directly call the corresponding scoring methods of the python class, then parsed model is loaded automatically from the model state for scoring.
DeleteModelState(): When the parse model becomes obsolete, it is necessary to free up the container for the model states since it consumes memory. This can easily realized by calling DeleteModelState() method of that class.

Classes/Algorithms Supporting State-Enabled Scoring

SVC: hanaml.SVC()
SVR: hanaml.SVR()
SVRanking: hanaml.SVRanking()
OneClassSVM: hanaml.OneClassSVM()
RDTClassifier: hanaml.RDTClassifier()
RDTRegressor: hanaml.RDTRegressor()
DecisionTreeClassifier: hanaml.DecisionTreeClassifier()
DecisionTreeRegressor: hanaml.DecisionTreeRegressor()
HGBTClassifier: hanaml.HGBTClassifier()
HGBTRegressor: hanaml.HGBTRegressor()
KMeans: hanaml.KMeans()
DBSCAN: hanaml.DBSCAN()
SOM: hanaml.SOM()
LatentDirichletAllocation: hanaml.LatentDirichletAllocation()
PCA: hanaml.PCA()
CATPCA: hanaml.CATPCA()
NaiveBayes: hanaml.NaiveBayes()
MLPClassifier: hanaml.MLPClassifier()
MLPRegressor: hanaml.MLPRegressor()
LinearRegression: hanaml.LinearRegression()
LogisticRegression: hanaml.LogisticRegression()
FRM: hanaml.FRM()
ALS: hanaml.ALS()
CRF: hanaml.CRF()
KNNClassifier: hanaml.KNNClassifier()
KNNRegressor: hanaml.KNNRegressor()
UnifiedClassification: hanaml.UnifiedClassification()
UnifiedRegression: hanaml.UnifiedRegression()

Usage Example

udtc <- UnifiedClassification(data=train.df, func="DecisionTree")
udtc$CreateModelState()#create the model state
res1 <- predict(udtc, data=test.df1, key="ID")#making predictions directly from the parsed model
res2 <- predict(udtc, data=test.df2, key="ID")#making predictions directly from the parsed model
udtc$DeleteModelState()#delete the model state