hanaml.SVC is an R wrapper of SAP HANA PAL SVM for classification.

hanaml.SVC(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  kernel = NULL,
  thread.ratio = NULL,
  degree = NULL,
  gamma = NULL,
  coef.lin = NULL,
  coef.const = NULL,
  c = NULL,
  scale.info = NULL,
  shrink = NULL,
  handle.missing = NULL,
  categorical.variable = NULL,
  category.weight = NULL,
  tol = NULL,
  evaluation.seed = NULL,
  probability = NULL,
  compression = NULL,
  max.bits = NULL,
  max.quantization.iter = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

features

character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.

label

character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.

kernel

{"linear", "poly", "rbf", "sigmoid"}, optional
kernel function.
Defaults to "rbf".

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

degree

integer, optional
Coefficient for the 'poly' kernel function.
Only valid when kernel = 'poly'. Value range: >= 1.
Defaults to 3.

gamma

double, optional
Coefficient for the 'rbf' kernel function.
Only valid when kernel = 'rbf'.
Defaults to 1.0/number of features in the dataset.

coef.lin

double, optional
Coefficient for the 'poly' or 'sigmoid' kernel function.
Only valid when kernel = 'poly' or 'sigmoid'.
Defaults to 0.

coef.const

double, optional
Coefficient for the 'poly' or 'sigmoid' kernel function.
Only valid when kernel = 'poly' or 'sigmoid'.
Defaults to 0.

c

double
Trade-off between training error and margin value range: > 0.
Defaults to 100.

scale.info

character, optional

  • "no" : No scale

  • "standardization": The algorithm transforms the data to have zero mean and unit variance.

  • "rescale" : The algorithm rescales the range of the features to scale the range in [-1,1].

Defaults to "standardization".

shrink

logical, optional
Decides whether to use shrink strategy or not.Using shrink strategy may accelerate the training process.

  • FALSE: Does not use shrink strategy.

  • TRUE: Uses shrink strategy.

Defaults to TRUE.

handle.missing

logical, optional
Whether to impute the missing values of the input data or not. If set to FALSE, all rows with missing values will be deleted.
Defaults to TRUE.

categorical.variable

character or list/vector of characters, optional
Indicates features should be treated as categorical variable.
The default behavior is dependent on what input is given:

  • "VARCHAR" and "NVARCHAR": categorical

  • "INTEGER" and "DOUBLE": continuous.

VALID only for variables of "INTEGER" type, omitted otherwise.
No default value.

category.weight

double, optional
Represents the weight of category attributes. The value must be greater than 0.
Defaults to 0.707.

tol

double, optional
Specifies the error tolerance in the training process. The value must be greater than 0.
Defaults to 0.001.

evaluation.seed

integer, optional(deprecated)
The random seed in parameter selection(same as random.state). The value must be no less than 0.
If set to 0, then system time is used for random generation.
Defaults to 0. If evaluation.seed and random.state are set simultaneously, random.state takes higher priority.

probability

logical, optional
If you want to output probability when scoring, set this to TRUE.
Defaults to FALSE.

compression

logical, optional
Specifies if the model is stored in compressed format. Default value depends on the SAP HANA Version. Please refer to the conresponding documentation of SAP HANA PAL.

max.bits

integer, optional
The maximum number of bits to quantize continuous features. Equivalent to use 2max.bits bins. Must be less than 31. Only valid Valid only when the value of compression is TRUE. Defaults to 12.

max.quantization.iter

integer, optional
The maximum iteration steps for quantization. Only valid Valid only when the value of compression is TRUE. Defaults to 1000.

resampling.method

character, optional
Specifies the resampling values form below list.
valid options are listed as follows:
"cv", "stratified_cv", "bootstrap", "stratified_bootstrap".
If no value is specifier, neither model evaluation nor parameter selection is activated.

evaluation.metric

character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Currently valid options are: "accuracy", "f1_score", "auc", "nll".

fold.num

integer, optional
Specifies the fold number for the cross-validation(cv). Mandatory and valid only when resampling.method is 'cv' or 'stratified_cv'.

repeat.times

numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

param.search.strategy

c("grid", "random"), optional
Specifies the method to activate parameter selection. If not specified, model parameter selection shall not be triggered.

random.search.times

integer, optional
Specifies the number of times to randomly select candidate parameters for selection. Mandatory and valid only when param.search.strategy is "random".

random.state

numeric, optional
Specifies the seed for random generation.
Use system time when 0 is specified.

timeout

integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds.
No timeout when 0 is specified.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.

parameter.range

list, optional
Specifies range of the following parameters for parameter selection:
c, gamma, degree, nu, coef.lin, coef.const.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(c = c(50, 10, 100)), which means taking c values from 50 to 100 with 10 being the step size, i.e. 50, 60, 70, 80, 90, 100.
If param.search.strategy is 'random', then the middle term, i.e. step has no effect and thus can be omitted.

parameter.values

list, optional
Specifies values of the following parameters for parameter selection:
c, gamma, degree, nu, coef.lin, coef.const.
Example: parameter.values <- list(gamma = c(0.01, 0.05, 0.07))

Value

Returns a "SVC" object with following value:

  • model : DataFrame
    Model content.

  • stat : DataFrame
    statistics.

Examples

Call the function:

> svc <- hanaml.SVC(data, key = "ID", gamma = 0.005)

See also