Online Linear Regression

hanaml.OnlineLinearRegression is a R wrapper for SAP HANA PAL Online linear regression algorithm.

hanaml.OnlineLinearRegression(
  enet.lambda = NULL,
  enet.alpha = NULL,
  max.iter = NULL,
  tol = NULL
)

Arguments

enet.lambda: double, optional
Penalized weight. Value should be greater than or equal to 0.
Defaults to 0.
enet.alpha: double, optional
Elastic net mixing parameter. Ranges from 0 (Ridge penalty) to 1 (LASSO penalty) inclusively. Defaults to 0.
max.iter: integer, optional
Maximum number of passes over training data. Defaults to 1000.
tol: double, optional
Convergence threshold for coordinate descent. Defaults to 1.0e-5.

Value

Returns a 'hanaml.OnlineLinearRegression' object with following values:

coefficients : DataFrame
Fitted regression coefficients.
intermediate.result : DataFrame
Intermediate model.

Details

Online linear regression is an online version of the linear regression and is used when the training data are obtained multiple rounds. Additional data are obtained in each round of training. By making use of the current computed linear model and combining with the obtained data in each round, online linear regression adapts the linear model to make the prediction as precise as possible.

Methods

fit(data = NULL, key = NULL, features = NULL, formula = NULL, thread.ratio = NULL) The fit function of an OnlineLinearRegression object. Usage:
olr <- hanaml.OnlineLinearRegression()
olr$fit(data, key='ID', features=list('X1','X2')) Arguments: data, DataFrame
Input data. key, character, optional
Name of the ID column.
If not provided, the data is assumed to have no ID column.
No default value. features, character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data. label, character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided. formula, formula type, optional
Formula to be used for model generation. format = label ~ <feature_list> e.g.: formula = CATEGORY~V1+V2+V3
You can either give the formula, or a feature and label combination, but do not provide both.
Defaults to NULL. thread.ratio, double, optional
Controls the proportion of available threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads. Values outside this range are ignored.
Defaults to -1.

Examples

First, initialize a "hanaml.OnlineLinearRegression" object:


> olr <- hanaml.OnlineLinearRegression(enet.lambda=0.1,
                                       enet.alpha=0.5,
                                       max.iter=1200,
                                       tol=1E-6)

Three rounds of data:


> df.1$Collect()
  ID      Y    X1    X2     X3
0  1  130.0   7.0  26.0 -888.0
1  2  124.0   1.0  29.0 -888.0
2  3  262.0  11.0  56.0 -888.0
3  4  162.0  11.0  31.0 -888.0

> df.2$Collect()
   ID      Y    X1    X2     X3
0   5  234.0   7.0  52.0 -888.0
1   6  258.0  11.0  55.0 -888.0
2   7  298.0   3.0  71.0 -888.0
3   8  132.0   1.0  31.0 -888.0

> df.3$Collect()
   ID      Y    X1    X2     X3
0   9  227.0   2.0  54.0 -888.0
1  10  256.0  21.0  47.0 -888.0
2  11  168.0   1.0  40.0 -888.0
3  12  302.0  11.0  66.0 -888.0
4  13  307.0  10.0  68.0 -888.0

Round 1, invoke fit() of olr for training the model with df.1:


> olr$fit(df.1, key='ID', label='Y', features=list('X1', 'X2'))

Output:


> olr$coefficients$Collect()
VARIABLE_NAME  COEFFICIENT_VALUE
0  __PAL_INTERCEPT__           5.076245
1                 X1           2.987277
2                 X2           4.000540

> olr$intermediate.result$Collect()
SEQUENCE                                 INTERMEDIATE_MODEL
0         0  "algorithm":"batch_algorithm","batch_algorith...

Round 2, invoke fit() for training the model with df.2:


> olr$fit(df.2, key='ID', label='Y', features=list('X1', 'X2'))

Output:


> olr$coefficients$Collect()
       VARIABLE_NAME  COEFFICIENT_VALUE
0  __PAL_INTERCEPT__           5.094444
1                 X1           2.988419
2                 X2           3.999563

> olr$intermediate.result$Collect()
   SEQUENCE                                 INTERMEDIATE_MODEL
0         0  "algorithm":"batch_algorithm","batch_algorith...

Round 3, invoke fit() for training the model with df.3:


> olr$fit(df.3, key='ID', label='Y', features=list('X1', 'X2'))

Output:


> olr$coefficients$Collect()
VARIABLE_NAME  COEFFICIENT_VALUE
0  __PAL_INTERCEPT__           5.073338
1                 X1           2.994118
2                 X2           3.999389

> olr$intermediate.result$Collect()
SEQUENCE                                 INTERMEDIATE_MODEL
0         0  "algorithm":"batch_algorithm","batch_algorith...

Call score() function:


> score <- olr$score(df.2, key='ID', label='Y', features=list('X1', 'X2'))
0.9999997918249237

Arguments

Value

Details

Methods

Examples

See also