This algorithm is the online version of Multi-Class Logistic Regression, while the Multi-Class Logistic Regression is offline/batch version. The difference is that during training phase, for the offline/batch version algorithm it requires all training data to be fed into the algorithm in one batch, then it tries its best to output one model to best fit the training data. This infers that the computer must have enough memory to store all data, and can obtain all data in one batch. Online version algorithm applies in scenario that either or all these two assumptions are not right.
class.label = NULL,
init.learning.rate = NULL,
decay = NULL,
drop.rate = NULL,
step.boundaries = NULL,
constant.values = NULL,
enet.alpha = NULL,
enet.lambda = NULL,
shuffle = NULL,
shuffle.seed = NULL,
weight.avg = NULL,
weight.avg.begin = NULL,
learning.rate.type = NULL,
general.learning.rate = NULL,
stair.case = NULL,
cycle = NULL,
epsilon = NULL,
window.size = NULL
list of characters
Indicate the class label and should be at least two class labels.
double, optional
The initial learning rate for learning rate schedule.
Value should be larger than 0.
Only valid when learning.rate.type is
"Inverse.time.decay", "Exponential.decay", "Polynomial.decay".
double, optional
Specify the learning rate decay speed for learning rate schedule.
Larger value indicates faster decay.
Value should be larger than 0.
When learning.rate.type is "exponential.decay",
value should be larger than 1.
Only valid when learning.rate.type
is "Inverse.time.decay", "Exponential.decay", "Polynomial.decay".
integer, optional
Specify the decay frequency.
There are apparent effect when stair.case is true.
Value should be larger than 0.
Only valid when learning.rate.type is "Inverse.time.decay",
"Exponential.decay", "Polynomial.decay".
character, optional
Specify the step boundaries for regions where step size remains constant.
The format of this parameter is a comma separated unsigned integer value.
The step value start from 0. The values should be in increasing order.
Empty value for this parameter is allowed.
Only valid when learning.rate.type is "Piecewise.constant.decay".
character, optional
Specify the constant step size for each region defined by step.boundaries.
The format of this parameter is a comma separated double value.
There should always be one more value than step.boundaries.
Only valid when learning.rate.type is "Piecewise.constant.decay".
double, optional
Elastic-Net mixing parameter.
The valid range is [0, 1]. When it is 0, this means Ridge penalty;
When it is 1, it is Lasso penalty.
Only valid when enet.lambda is not 0.0.
Defaults to 1.0.
double, optional
Penalized constant. The value should be larger than or equal to 0.0.
The higher the value, the characteronger the regularization.
When it equal to 0.0, there is no regularization.
Defaults to 0.0.
logical, optonal
logical value indicating whether need to shuffle the row order of observation data.
FALSE means keeping original order; TRUE means performing shuffle operation.
Defaults to FALSE.
integer, optonal
The seed is used to initialize the random generator to perform shuffle operation.
The value of this parameter should be larger than or equal to 0.
If need to reproduce the result when performing shuffle operation,
please set this value to non-zero.
Only valid when shuffle is TRUE.
Defaults to 0.
logical, optonal
logical value indicating whether need to perform average operator over output model.
FALSE means directly output model;
TRUE means perform average operator over output model.
Currently only support Polyak Ruppert Average.
Defaults to FALSE.
integer, optonal
Specify the beginning step counter to perform the average operator over model.
The value should be larger than or equal to 0. When current step counter is less than this parameter,
just directly output model.Only valid when weight.avg is TRUE.
Defaults to 0.
character, optonal
Specify the learning rate type for SGD algorithm.
- "Inverse.time.decay"
- "Exponential.decay"
- "Polynomial.decay"
- "Piecewise.constant.decay"
- "AdaGrad"
- "AdaDelta"
- "RMSProp"
Defaults to "RMSProp".
double, optonal
Specify the general learning rate used in AdaGrad and RMSProp.
The value should be larger than 0.
Only valid when learning.rate.type is "AdaGrad", "RMSProp".
Defaults to 0.001.
logical, optonal
logical value indicate the drop way of step size. FALSE means drop step size smoothly.
Only valid when learning.rate.type is "Inverse.time.decay", "Exponential.decay".
Defaults to FALSE.
logical, optonal
indicate whether need to cycle from the start when reaching specified end learning rate.
FALSE means do not cycle from the start; TRUE means cycle from the start.
Only valid when learning.rate.type is 'Polynomial.decay'.
Defaults to FALSE.
double, optonal
This parameter has multiple purposes depending on the learn rate type.
The value should be within (0, 1). When used in learn rate type 0 and 1, it represent the smallest allowable step size.
When step size reach this value, it will no longer change.
When used in learning.rate.type 'Polynomial.decay', it represent the end learn rate.
When used in learning.rate.type 'AdaGrad', 'AdaDelta', 'RMSProp', it is used to avoid dividing by 0.
Only valid when learning.rate.type is not 'Piecewise.constant.decay'.
Defaults to 1E-8.
double, optonal
This parameter controls the moving window size of recent steps. The value should be in range (0, 1).
Larger value means more steps are kept in track.
Only valid when learning.rate.type is 'AdaDelta', 'RMSProp'.
Defaults to 0.9.
A "OnlineMultiLogisticRegression" object with the following attributes:
coef: DataFrame
Coefficient values for multi logisitic regression model.
online.result: DataFrame
Updated online training result.
data, DataFrame
key, character, optional
features, character of list of characters, optional
label, character, optional
formula, formula type, optional
thread.ratio, double, optional
progress.indicator.id, character, optional
First, initialize an online multi logistic regression instance:
> omlr <- OnlineMultiLogisticRegression(class.label=list("0","1","2"),
enet.alpha=0.2, weight.avg=TRUE,
weight.avg.begin=8, learning.rate.type = "rmsprop",
window.size=0.9, epsilon = 1e-6)
Four rounds of data:
> df.1$Collect()
X1 X2 Y
0 1.160456 -0.079584 0.0
1 1.216722 -1.315348 2.0
2 1.018474 -0.600647 1.0
3 0.884580 1.546115 1.0
4 2.432160 0.425895 1.0
5 1.573506 -0.019852 0.0
6 1.285611 -2.004879 1.0
7 0.478364 -1.791279 2.0
> df.2$Collect()
X1 X2 Y
0 -1.799803 1.225313 1.0
1 0.552956 -2.134007 2.0
2 0.750153 -1.332960 2.0
3 2.024223 -1.406925 2.0
4 1.204173 -1.395284 1.0
5 1.745183 0.647891 0.0
6 1.406053 0.180530 0.0
7 1.880983 -1.627834 2.0
> df.3$Collect()
X1 X2 Y
0 1.860634 -2.474313 2.0
1 0.710662 -3.317885 2.0
2 1.153588 0.539949 0.0
3 1.297490 -1.811933 2.0
4 2.071784 0.351789 0.0
5 1.552456 0.550787 0.0
6 1.202615 -1.256570 2.0
7 -2.348316 1.384935 1.0
> df.4$Collect()
X1 X2 Y
0 -2.132380 1.457749 1.0
1 0.549665 0.174078 1.0
2 1.422629 0.815358 0.0
3 1.318544 0.062472 0.0
4 0.501686 -1.286537 1.0
5 1.541711 0.737517 1.0
6 1.709486 -0.036971 0.0
7 1.708367 0.761572 0.0
Round 1, invoke fit() for training the model with df.1:
> omlr$fit(df.1, label='Y', features=list('X1', 'X2'))
> omlr$coef$Collect
0 __PAL_INTERCEPT__ 0 -0.245137
1 __PAL_INTERCEPT__ 1 0.112396
2 __PAL_INTERCEPT__ 2 -0.236284
3 X1 0 -0.189930
4 X1 1 0.218920
5 X1 2 -0.372500
6 X2 0 0.279547
7 X2 1 0.458214
8 X2 2 -0.185378
Round 2, invoke fit() for training the model with df.2:
> omlr$fit(df.2, label='Y', features=list('X1', 'X2'))
> omlr$coef$Collect
0 __PAL_INTERCEPT__ 0 -0.359296
1 __PAL_INTERCEPT__ 1 0.163218
2 __PAL_INTERCEPT__ 2 -0.182423
3 X1 0 -0.045149
4 X1 1 -0.046508
5 X1 2 -0.122690
6 X2 0 0.420425
7 X2 1 0.594954
8 X2 2 -0.451050
Round 3, invoke fit() for training the model with df.3:
> omlr$fit(df.3, label='Y', features=list('X1', 'X2'))
> omlr$coef$Collect
0 __PAL_INTERCEPT__ 0 -0.225687
1 __PAL_INTERCEPT__ 1 0.031453
2 __PAL_INTERCEPT__ 2 -0.173944
3 X1 0 0.100580
4 X1 1 -0.208257
5 X1 2 -0.097395
6 X2 0 0.628975
7 X2 1 0.576544
8 X2 2 -0.582955
Round 4, invoke fit() for training the model with df.4:
> omlr$fit(df.4, label='Y', features=list('X1', 'X2'))
> omlr.coef$Collect
0 __PAL_INTERCEPT__ 0 -0.204118
1 __PAL_INTERCEPT__ 1 0.071965
2 __PAL_INTERCEPT__ 2 -0.263698
3 X1 0 0.239740
4 X1 1 -0.326290
5 X1 2 -0.139859
6 X2 0 0.696389
7 X2 1 0.590014
8 X2 2 -0.643752