| data |
DataFrame
DataFrame containting the data.
|
| key |
character, optional
Name of the ID column.
If not provided, the data is assumed to have no ID column.
No default value.
|
| features |
character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.
|
| label |
character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.
|
| formula |
formula type, optional
Formula to be used for model generation.
format = label~<feature_list>
e.g.: formula=CATEGORY~V1+V2+V3
You can either give the formula,
or a feature and label combination, but do not provide both.
Defaults to NULL.
|
| enet.alpha |
double, optional
Elastic net mixing parameter.
Only valid when solver is 'cyclical' or 'proximal'.
Defaults to 1.0 .
|
| enet.lambda |
double, optional
Penalized weight.
Only valid when solver is 'cyclical' or 'proximal'.
|
| tol |
double, optional
Convergence threshold for exiting iterations.
Defaults to 1.0e-7 when solver is 'cyclical',
otherwise it defaults to '1.0e-6'.
|
| epsilon |
double, optional
The parameter determines the accuracy with which the solution is to be found.
Defaults to 1.0e-6 when solver is 'newton', or '1.0e-5' when solver is 'lbfgs'.
|
| solver |
character, optional
Optimization algorithm.
Possible values include:
"auto": Automatically determined from data and other parameters.
"newton": Newton iteration method.
"cyclical" - Cyclical coordinate descent method to
fit elastic net regularized Logistic Regression.
"lbfgs" - LBFGS method. Recommended when having
many independent variables.
"stochastic" - Stochastic gradient descent method.
Recommended when dealing with very large dataset.
"proximal" - Proximal gradientdescent method to fit
elastic net regularized logistic regression.
All values are available when multi.class is FALSE,
otherwise only "lbfgs" and "cyclical" are available.
Defaults to "auto" when multi.class is FALSE,
and "lbfgs" when multi.class is TRUE. |
| max.iter |
integer, optional
Maximum number of iterations taken for the solvers to converge.
If convergence is not reached after this number, an error will be generated.
For multi.class, the default value is 100.
For binary.class, the default value is 100000 when solver is "cyclical",
"1000" when solver is "proximal", or otherwise is "100".
|
| thread.ratio |
double, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all the currently available threads.
Values outside this range tell PAL to heuristically determine
the number of threads to use.
Only valid when multi.class is FALSE.
Defaults to 1.0 for Logistic regression.
|
| standardize |
logical, optional
Controls whether to standardize the data to have zero mean and unit variance.
FALSE - indicates no zero mean and unit variance.
TRUE - standarize the data with zero mean and unit variance.
Defaults to TRUE.
|
| max.pass.number |
integer, optional
The maximum number of passes over the data.
Warning: only valid when solver is stochastic and multi.class is FALSE.
Defaults to 1. (Only valid when solver is 'stochastic')
|
| lbfgs.m |
integer
Number of past updates to be kept.
Only available when solver is "lbfgs".
Defaults to 6.
|
| pmml.export |
"no", "single-row", "multi-row"
Controls whether to output a PMML representation of the model and how to format
the PMML.
For multi.class, valid options are:
For binary.class:
"no" - No PMML model.
"single" - Exports a PMML model in a maximum of
one row. Fails if the model doesn't fit in one row.
"multi-row" - Exports a PMML model, splitting it
across multiple rows if it doesn't fit in one.
Defaults to "no". |
| stat.inf |
logical, optional
Indicates whether or not to a calculate stastical inferences
from the given data.
Defaults to FALSE. |
| categorical.variable |
character or list/vector of characters, optional
Indicates features should be treated as categorical variable.
The default behavior is dependent on what input is given:
VALID only for variables of "INTEGER" type, omitted otherwise.
No default value. |
| class.map0 |
character, optional
Categorical label to map to 0.
Only valid when multi.class is FALSE.
class.map0 is mandatory when label column type is VARCHAR or
NVARCHAR during binary class fit and score.
|
| class.map1 |
character, optional
Categorical label to map to 1.
Only valid when multi.class is FALSE.
class.map1 is mandatory when label column type is VARCHAR or
NVARCHAR during binary class fit and score.
|
| multi.class |
logical, optional
If set to TRUE, a multi-class classification is performed.
Otherwise, there must be only two classes.
Defaults to FALSE.
|
| sgd.batch.number |
integer, optional
The batch number of stochastic gradient method.
Valid only when multi.class is FALSE and method is "stochastic".
Defaults to 1.
|
| precompute |
logical, optional
Whether or not to precompute the Gram matrix for cyclical coordinate descent method.
Valid only when method is "cyclical".
Defaults to TRUE.
|
| handle.missing |
logical, optional
Whether or not to impute the missing values of the input training data.
Defaults to TRUE.
|
| resampling.method |
character, optional
Specifies the resampling values form below list.
valid options are listed as follows:
"cv", "stratified_cv", "bootstrap", "stratified_bootstrap".
If no value is specifier, neither model evaluation
nor parameter selection is activated.
|
| evaluation.metric |
character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Currently valid options are: "accuracy", "f1_score", "auc", "nll".
|
| fold.num |
integer, optional
Specifies the fold number for the cross-validation(cv).
Mandatory and valid only when resampling.method is "cv" or "stratified_cv".
|
| repeat.times |
numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.
|
| param.search.strategy |
c("grid", "random"), optional
Specifies the method to activate parameter selection.
If not specified, model parameter selection shall not be triggered.
|
| random.search.times |
integer, optional
Specifies the number of times to randomly select candidate parameters for selection.
Mandatory and valid only when param.search.strategy is "random".
|
| random.state |
numeric, optional
Specifies the seed for random generation.
Use system time when 0 is specified.
|
| timeout |
integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds.
No timeout when 0 is specified.
|
| progress.indicator.id |
character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.
|
| parameter.range |
list, optional
Specifies range of the following parameters for parameter selection:
enet.lambda, enet.alpha.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(enet.lambda = c(0.01, 0.01, 0.1)), which means taking
enet.lambda values from 0.01 to 0.1 with 0.01 being the step size, i.e.
0.01, 0.02, 0.03, ..., 0.09, 0.1.
If param.search.strategy is 'random', then the middle term,
i.e. step has no effect and thus can be omitted.
|
| parameter.values |
list, optional
Specifies values of the following parameters for parameter selection:
enet.lambda, enet.alpha.
Example: parameter.values <- list(enet.lambda = c(0.001, 0.003, 0.007, 0.01))
|