Model-State-for-Scoring.Rmd
When a model is applied in scoring
functions(e.g. predict()
, score()
,
transform()
) in hana.ml.r package (also in
SAP HANA PAL), two steps are performed sequentially: firstly the model
is parsed, secondly the parsed model is applied to the scoring data.
This is the common way that scoring functions works in SAP HANA PAL, and
usually it won’t be an issue. However, for scenarios where the trained
model is complex enough while the model is called by scoring functions
multiple times, parsing the model shall take too much time compared with
the total execution time of scoring functions. In such scenarios, it is
definitely unwise to parse the model over and over again.
In hana.ml.r, the functionality for running the two sequential steps in scoring functions separately is provided, which is based on the mechanism implemented in SAP HANA PAL. Basically, model parsing(de-serializing the model content from the database table and converting it into model object) could be executed alone only once, and then the parsed model could be kept and repeatedly applied to any incoming scoring data.
In hana.ml.r, there are a family of classes that support the splitting of model parsing and scoring execution, where the following public/S3 methods get involved:
CreateModelState()
: Should be called in the presence
of a trained model, the model content is read from the database tables,
parsed, and finally kept in container called state
.
Identifier of the model state is stored in a database table and is
assigned to the state
attribute of the class object for
reference.
scoring functions(e.g. predict()
,
score()
, transform()
) : After a model state
has been created, we can directly call the corresponding scoring methods
of the python class, then parsed model is loaded automatically from the
model state for scoring.
DeleteModelState()
: When the parse model becomes
obsolete, it is necessary to free up the container for the model states
since it consumes memory. This can easily realized by calling
DeleteModelState()
method of that class.
hanaml.SVC()
hanaml.SVR()
hanaml.SVRanking()
hanaml.OneClassSVM()
hanaml.RDTClassifier()
hanaml.RDTRegressor()
hanaml.DecisionTreeClassifier()
hanaml.DecisionTreeRegressor()
hanaml.HGBTClassifier()
hanaml.HGBTRegressor()
hanaml.KMeans()
hanaml.DBSCAN()
hanaml.SOM()
hanaml.LatentDirichletAllocation()
hanaml.PCA()
hanaml.CATPCA()
hanaml.NaiveBayes()
hanaml.MLPClassifier()
hanaml.MLPRegressor()
hanaml.LinearRegression()
hanaml.LogisticRegression()
hanaml.FRM()
hanaml.ALS()
hanaml.CRF()
hanaml.KNNClassifier()
hanaml.KNNRegressor()
hanaml.UnifiedClassification()
hanaml.UnifiedRegression()
udtc <- UnifiedClassification(data=train.df, func="DecisionTree")
udtc$CreateModelState()#create the model state
res1 <- predict(udtc, data=test.df1, key="ID")#making predictions directly from the parsed model
res2 <- predict(udtc, data=test.df2, key="ID")#making predictions directly from the parsed model
udtc$DeleteModelState()#delete the model state