hanaml.OnlineLinearRegression.Rdhanaml.OnlineLinearRegression is a R wrapper for SAP HANA PAL Online linear regression algorithm.
hanaml.OnlineLinearRegression( enet.lambda = NULL, enet.alpha = NULL, max.iter = NULL, tol = NULL )
| enet.lambda |
|
|---|---|
| enet.alpha |
|
| max.iter |
|
| tol |
|
Returns a 'hanaml.OnlineLinearRegression' object with following values:
coefficients : DataFrame
Fitted regression coefficients.
intermediate.result : DataFrame
Intermediate model.
Online linear regression is an online version of the linear regression and is used when the training data are obtained multiple rounds. Additional data are obtained in each round of training. By making use of the current computed linear model and combining with the obtained data in each round, online linear regression adapts the linear model to make the prediction as precise as possible.
data, DataFramekey, character, optionalfeatures, character of list of characters, optionallabel, character, optionalformula, formula type, optionalthread.ratio, double, optionalFirst, initialize a "hanaml.OnlineLinearRegression" object:
> olr <- hanaml.OnlineLinearRegression(enet.lambda=0.1,
enet.alpha=0.5,
max.iter=1200,
tol=1E-6)
Three rounds of data:
> df.1$Collect() ID Y X1 X2 X3 0 1 130.0 7.0 26.0 -888.0 1 2 124.0 1.0 29.0 -888.0 2 3 262.0 11.0 56.0 -888.0 3 4 162.0 11.0 31.0 -888.0 > df.2$Collect() ID Y X1 X2 X3 0 5 234.0 7.0 52.0 -888.0 1 6 258.0 11.0 55.0 -888.0 2 7 298.0 3.0 71.0 -888.0 3 8 132.0 1.0 31.0 -888.0 > df.3$Collect() ID Y X1 X2 X3 0 9 227.0 2.0 54.0 -888.0 1 10 256.0 21.0 47.0 -888.0 2 11 168.0 1.0 40.0 -888.0 3 12 302.0 11.0 66.0 -888.0 4 13 307.0 10.0 68.0 -888.0
Round 1, invoke fit() of olr for training the model with df.1:
> olr$fit(df.1, key='ID', label='Y', features=list('X1', 'X2'))
Output:
> olr$coefficients$Collect() VARIABLE_NAME COEFFICIENT_VALUE 0 __PAL_INTERCEPT__ 5.076245 1 X1 2.987277 2 X2 4.000540 > olr$intermediate.result$Collect() SEQUENCE INTERMEDIATE_MODEL 0 0 "algorithm":"batch_algorithm","batch_algorith...
Round 2, invoke fit() for training the model with df.2:
> olr$fit(df.2, key='ID', label='Y', features=list('X1', 'X2'))
Output:
> olr$coefficients$Collect()
VARIABLE_NAME COEFFICIENT_VALUE
0 __PAL_INTERCEPT__ 5.094444
1 X1 2.988419
2 X2 3.999563
> olr$intermediate.result$Collect()
SEQUENCE INTERMEDIATE_MODEL
0 0 "algorithm":"batch_algorithm","batch_algorith...
Round 3, invoke fit() for training the model with df.3:
> olr$fit(df.3, key='ID', label='Y', features=list('X1', 'X2'))
Output:
> olr$coefficients$Collect() VARIABLE_NAME COEFFICIENT_VALUE 0 __PAL_INTERCEPT__ 5.073338 1 X1 2.994118 2 X2 3.999389 > olr$intermediate.result$Collect() SEQUENCE INTERMEDIATE_MODEL 0 0 "algorithm":"batch_algorithm","batch_algorith...
Call score() function:
> score <- olr$score(df.2, key='ID', label='Y', features=list('X1', 'X2'))
0.9999997918249237