hanaml.OnlineLinearRegression.Rd
hanaml.OnlineLinearRegression is a R wrapper for SAP HANA PAL Online linear regression algorithm.
hanaml.OnlineLinearRegression(
enet.lambda = NULL,
enet.alpha = NULL,
max.iter = NULL,
tol = NULL
)
double, optional
Penalized weight. Value should be greater than or equal to 0.
Defaults to 0.
double, optional
Elastic net mixing parameter.
Ranges from 0 (Ridge penalty) to 1 (LASSO penalty) inclusively.
Defaults to 0.
integer, optional
Maximum number of passes over training data.
Defaults to 1000.
double, optional
Convergence threshold for coordinate descent.
Defaults to 1.0e-5.
Returns a 'hanaml.OnlineLinearRegression' object with following values:
coefficients : DataFrame
Fitted regression coefficients.
intermediate.result : DataFrame
Intermediate model.
Online linear regression is an online version of the linear regression and is used when the training data are obtained multiple rounds. Additional data are obtained in each round of training. By making use of the current computed linear model and combining with the obtained data in each round, online linear regression adapts the linear model to make the prediction as precise as possible.
data, DataFrame
key, character, optional
features, character of list of characters, optional
label, character, optional
formula, formula type, optional
thread.ratio, double, optional
First, initialize a "hanaml.OnlineLinearRegression" object:
> olr <- hanaml.OnlineLinearRegression(enet.lambda=0.1,
enet.alpha=0.5,
max.iter=1200,
tol=1E-6)
Three rounds of data:
> df.1$Collect()
ID Y X1 X2 X3
0 1 130.0 7.0 26.0 -888.0
1 2 124.0 1.0 29.0 -888.0
2 3 262.0 11.0 56.0 -888.0
3 4 162.0 11.0 31.0 -888.0
> df.2$Collect()
ID Y X1 X2 X3
0 5 234.0 7.0 52.0 -888.0
1 6 258.0 11.0 55.0 -888.0
2 7 298.0 3.0 71.0 -888.0
3 8 132.0 1.0 31.0 -888.0
> df.3$Collect()
ID Y X1 X2 X3
0 9 227.0 2.0 54.0 -888.0
1 10 256.0 21.0 47.0 -888.0
2 11 168.0 1.0 40.0 -888.0
3 12 302.0 11.0 66.0 -888.0
4 13 307.0 10.0 68.0 -888.0
Round 1, invoke fit() of olr for training the model with df.1:
> olr$fit(df.1, key='ID', label='Y', features=list('X1', 'X2'))
Output:
> olr$coefficients$Collect()
VARIABLE_NAME COEFFICIENT_VALUE
0 __PAL_INTERCEPT__ 5.076245
1 X1 2.987277
2 X2 4.000540
> olr$intermediate.result$Collect()
SEQUENCE INTERMEDIATE_MODEL
0 0 "algorithm":"batch_algorithm","batch_algorith...
Round 2, invoke fit() for training the model with df.2:
> olr$fit(df.2, key='ID', label='Y', features=list('X1', 'X2'))
Output:
> olr$coefficients$Collect()
VARIABLE_NAME COEFFICIENT_VALUE
0 __PAL_INTERCEPT__ 5.094444
1 X1 2.988419
2 X2 3.999563
> olr$intermediate.result$Collect()
SEQUENCE INTERMEDIATE_MODEL
0 0 "algorithm":"batch_algorithm","batch_algorith...
Round 3, invoke fit() for training the model with df.3:
> olr$fit(df.3, key='ID', label='Y', features=list('X1', 'X2'))
Output:
> olr$coefficients$Collect()
VARIABLE_NAME COEFFICIENT_VALUE
0 __PAL_INTERCEPT__ 5.073338
1 X1 2.994118
2 X2 3.999389
> olr$intermediate.result$Collect()
SEQUENCE INTERMEDIATE_MODEL
0 0 "algorithm":"batch_algorithm","batch_algorith...
Call score() function:
> score <- olr$score(df.2, key='ID', label='Y', features=list('X1', 'X2'))
0.9999997918249237