R: Logistic Regression

hanaml.LogisticRegression {hana.ml.r}

R Documentation

Logistic Regression

Description

hanaml.LogisticRegression is a R wrapper for PAL Logistic Regression.

Usage

hanaml.LogisticRegression (conn.context, data = NULL, formula = NULL,
                          features = NULL, label = NULL, key = NULL,
                          enet.alpha = NULL, enet.lambda = NULL, tol = NULL,
                          epsilon = NULL, solver = NULL, max.iter = NULL,
                          thread.ratio = NULL, standardize = NULL,
                          max.pass.number = NULL, lbfgs.m = NULL,
                          pmml.export = NULL, stat.inf = NULL,
                          categorical.variable = NULL,
                          class.map0 = NULL, class.map1 = NULL,
                          multi.class = FALSE, sgd.batch.number = NULL,
                          precompute = NULL, handle.missing = NULL)

Arguments

`conn.context`	`ConnectionContext` The connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character, optional` Name of the ID column of data. If not provided, then it is assumed that data has no ID column.
`features`	`list of character, optional` Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.
`label`	`character, optional` Name of the column in data that specifies the dependent variable.
`formula`	`formula type, optional` Formula to be used for model generation. format = label~<feature_list> eg: formula=CATEGORY~V1+V2+V3 You can either give the formula, or a feature and label combination. Do not provide both. Defaults to NULL.
`enet.alpha`	`double, optional` Elastic net mixing parameter. Only valid when solver is 'cyclical' or 'proximal'. Defaults to 1.0 .
`enet.lambda`	`double, optional` Penalized weight. Only valid when 'solver' is 'cyclical' or 'proximal'.
`tol`	`double, optional` Convergence threshold for exiting iterations. Defaults to 1.0e-7 when 'solver' is cyclical, otherwise it defaults to '1.0e-6'.
`epsilon`	`double, optional` The parameter determines the accuracy with which the solution is to be found. Defaults to 1.0e-6 when "solver" is newton, or '1.0e-5' when 'solver' is lbfgs.
`solver`	`character, optional` Optimization algorithm. Possible values include: `auto`: Automatically determined from data and other parameters. `newton`: Newton iteration method. `cyclical` - Cyclical coordinate descent method to fit elastic net regularized Logistic Regression. `lbfgs` - LBFGS method. Recommended when having many independent variables. `stochastic` - Stochastic gradient descent method. Recommended when dealing with very large dataset. `proximal` - Proximal gradientdescent method to fit elastic net regularized logistic regression. All values are available when `multi.class` is FALSE, otherwise only 'lbfgs' and 'cyclical' are available. Defaults to 'auto' when `multi.class` is FALSE, and 'lbfgs' when `multi.class` is TRUE.
`max.iter`	`integer` Maximum number of iterations taken for the solvers to converge. If convergence is not reached after this number, an error will be generated. For multi.class the default is '100'. For binary.class the default is '100000' when solver is cyclical, '1000' when solver is proximal, or otherwise is '100'.
`thread.ratio`	`double, optional` Specifies the ratio of total number of threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all the currently available threads. Values outside this range tell PAL to heuristically determine the number of threads to use. Only valid when multi.class is 'FALSE'. Defaults to '1.0' for Logistic regression and '0.0' for predict and score.
`standardize`	`logical, optional` Controls whether to standardize the data to have zero mean and unit variance. A value of 'FALSE' indicates no zero mean and unit variance. A value of 'TRUE' will standarize the data with zero mean and unit variance. Defaults to 'TRUE'.
`max.pass.number`	`integer, optional` The maximum number of passes over the data. Warning: only valid when solver is stochastic and multi.class is FALSE. Defaults to '1.' (Only valid when solver is Stochastic)
`lbfgs.m`	`integer` Number of past updates to be kept. Only available when solver is 'LBFGS'. Defaults to '6'.
`pmml.export`	`'no', 'single-row', 'multi-row'` Controls whether to output a PMML representation of the model and how to format the PMML. Case-insensitive. For multi.class, valid options are: `'no'` - No PMML model. `'multi-row'` - Exports a PMML model, splitting it across multiple rows if it doesn't fit in one. For binary.class: `'no'` - No PMML model. `'single'` - Exports a PMML model in a maximum of one row. Fails if the model doesn't fit in one row. `'multi-row'` - Exports a PMML model, splitting it across multiple rows if it doesn't fit in one. Defaults to 'no'.
`stat.inf`	`logical, optional` Indicates whether or not to a calculate stastical inferences from the given data. `FALSE` - Does not calculate statistical inference. `TRUE` - Calculates statistical inference. Defaults to FALSE.
`categorical.variable`	`character or list of characters, optional` Column names in the data table used as category variable.
`class.map0`	`character, optional` Categorical label to map to 0. Only valid when `multi.class` is 'FALSE'. class.map0 is mandatory when label column type is VARCHAR or NVARCHAR during binary class fit and score.
`class.map1`	`character, optional` Categorical label to map to 1. Only valid when `multi.class` is 'FALSE'. class.map1 is mandatory when label column type is VARCHAR or NVARCHAR during binary class fit and score.
`multi.class`	`logical, optional` If set to TRUE, a multi-class classification is performed. Otherwise, there must be only two classes. Defaults to FALSE.
`sgd.batch.number`	`integer, optional` The batch number of stochastic gradient method. Valid only when `multi.class` is FALSE and `method` is 'stochastic'. Defaults to 1.
`precompute`	`logical, optional` Whether or not to precompute the Gram matrix for cyclical coordinate descent method. Valid only when `method` is 'cyclical'. Defaults to TRUE.
`handle.missing`	`logical, optional` Whether or not to impute the missing values of the input training data. Defaults to TRUE.

Format

R6Class object.

Value

A "LogisticRegression" object with the following attributes:

result: DataFrame
Coefficient values for logisitic regression model(together with z-scores and p-values).
pmml: DataFrame
LogisticRegression model in PMML format.
statistic.info: DataFrame
Related statistics for the logistic regression model and its solving process, including AIC, objective-value, log-likelihood, number of iterations used, solution status, etc.
optimal.param: DataFrame
Optimal model parameters selected. Reserved for model selection using cross-validation.

Note

Using Summary and Print

Summary provides a general summary of the output of the model. Usage: summary(lr) where lr is an LogisticRegression object initialized with training data.

Print provides information on the coefficients and the optional parameter values given by the user. Usage: print(lr) where lr is an LogisticRegression object initialized with training data.

Examples

## Not run: 
lr = hanaml.LogisticRegression(conn.context = conn, data = df1)
OR
lr = hanaml.LogisticRegression(conn.context = conn, data = df1,
formula = CATEGORY~V1+V2+V3,
            solver='newton',thread.ratio=0.1, max.iter=1000,
            categorical.variable='V3', pmml.export='single-row',
            stat.inf=TRUE, tol=0.000001)

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]