Support Vector Classification (SVC)

hanaml.SVC is an R wrapper of SAP HANA PAL SVM for classification.

hanaml.SVC(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  kernel = NULL,
  thread.ratio = NULL,
  degree = NULL,
  gamma = NULL,
  coef.lin = NULL,
  coef.const = NULL,
  c = NULL,
  scale.info = NULL,
  shrink = NULL,
  handle.missing = NULL,
  categorical.variable = NULL,
  category.weight = NULL,
  tol = NULL,
  evaluation.seed = NULL,
  probability = NULL,
  compression = NULL,
  max.bits = NULL,
  max.quantization.iter = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
key	`character, optional` Name of the ID column. If not provided, the data is assumed to have no ID column. No default value.
features	`character of list of characters, optional` Name of feature columns. If not provided, it defaults all non-key, non-label columns of data.
label	`character, optional` Name of the column which specifies the dependent variable. Defaults to the last column of data if not provided.
kernel	`{"linear", "poly", "rbf", "sigmoid"}, optional` kernel function. Defaults to "rbf".
thread.ratio	`double, optional` Controls the proportion of available threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored. Defaults to 0.
degree	`integer, optional` Coefficient for the 'poly' kernel function. Only valid when kernel = 'poly'. Value range: >= 1. Defaults to 3.
gamma	`double, optional` Coefficient for the 'rbf' kernel function. Only valid when kernel = 'rbf'. Defaults to 1.0/number of features in the dataset.
coef.lin	`double, optional` Coefficient for the 'poly' or 'sigmoid' kernel function. Only valid when kernel = 'poly' or 'sigmoid'. Defaults to 0.
coef.const	`double, optional` Coefficient for the 'poly' or 'sigmoid' kernel function. Only valid when kernel = 'poly' or 'sigmoid'. Defaults to 0.
c	`double` Trade-off between training error and margin value range: > 0. Defaults to 100.
scale.info	`character, optional` "no" : No scale "standardization": The algorithm transforms the data to have zero mean and unit variance. "rescale" : The algorithm rescales the range of the features to scale the range in [-1,1]. Defaults to "standardization".
shrink	`logical, optional` Decides whether to use shrink strategy or not.Using shrink strategy may accelerate the training process. FALSE: Does not use shrink strategy. TRUE: Uses shrink strategy. Defaults to TRUE.
handle.missing	`logical, optional` Whether to impute the missing values of the input data or not. If set to FALSE, all rows with missing values will be deleted. Defaults to TRUE.
categorical.variable	`character or list/vector of characters, optional` Indicates features should be treated as categorical variable. The default behavior is dependent on what input is given: "VARCHAR" and "NVARCHAR": categorical "INTEGER" and "DOUBLE": continuous. VALID only for variables of "INTEGER" type, omitted otherwise. No default value.
category.weight	`double, optional` Represents the weight of category attributes. The value must be greater than 0. Defaults to 0.707.
tol	`double, optional` Specifies the error tolerance in the training process. The value must be greater than 0. Defaults to 0.001.
evaluation.seed	`integer, optional(deprecated)` The random seed in parameter selection(same as `random.state`). The value must be no less than 0. If set to 0, then system time is used for random generation. Defaults to 0. If `evaluation.seed` and `random.state` are set simultaneously, `random.state` takes higher priority.
probability	`logical, optional` If you want to output probability when scoring, set this to TRUE. Defaults to FALSE.
compression	`logical, optional` Specifies if the model is stored in compressed format. Default value depends on the SAP HANA Version. Please refer to the conresponding documentation of SAP HANA PAL.
max.bits	`integer, optional` The maximum number of bits to quantize continuous features. Equivalent to use 2^max.bits bins. Must be less than 31. Only valid Valid only when the value of compression is TRUE. Defaults to 12.
max.quantization.iter	`integer, optional` The maximum iteration steps for quantization. Only valid Valid only when the value of compression is TRUE. Defaults to 1000.
resampling.method	`character, optional` Specifies the resampling values form below list. valid options are listed as follows: "cv", "stratified_cv", "bootstrap", "stratified_bootstrap". If no value is specifier, neither model evaluation nor parameter selection is activated.
evaluation.metric	`character, optional` Specifies the evaluation metric for model evaluation or parameter selection. Currently valid options are: "accuracy", "f1_score", "auc", "nll".
fold.num	`integer, optional` Specifies the fold number for the cross-validation(cv). Mandatory and valid only when `resampling.method` is 'cv' or 'stratified_cv'.
repeat.times	`numeric, optional` Specifies the number of repeat times for resampling. Defaults to 1.
param.search.strategy	`c("grid", "random"), optional` Specifies the method to activate parameter selection. If not specified, model parameter selection shall not be triggered.
random.search.times	`integer, optional` Specifies the number of times to randomly select candidate parameters for selection. Mandatory and valid only when `param.search.strategy` is "random".
random.state	`numeric, optional` Specifies the seed for random generation. Use system time when 0 is specified.
timeout	`integer, optional` Specifies maximum running time for model evaluation or parameter selection in seconds. No timeout when 0 is specified.
progress.indicator.id	`character, optional` Sets an ID of progress indicator for model evaluation or parameter selection. No progress indicator is active if no value is provided.
parameter.range	`list, optional` Specifies range of the following parameters for parameter selection: `c, gamma, degree, nu, coef.lin, coef.const`. Parameter range should be specified by 3 numbers in the form of c(start, step, end). Examples: parameter.range <- list(c = c(50, 10, 100)), which means taking `c` values from 50 to 100 with 10 being the step size, i.e. 50, 60, 70, 80, 90, 100. If `param.search.strategy` is 'random', then the middle term, i.e. step has no effect and thus can be omitted.
parameter.values	`list, optional` Specifies values of the following parameters for parameter selection: `c, gamma, degree, nu, coef.lin, coef.const`. Example: parameter.values <- list(gamma = c(0.01, 0.05, 0.07))

Value

Returns a "SVC" object with following value:

model : DataFrame
Model content.
stat : DataFrame
statistics.

Examples

Call the function:

> svc <- hanaml.SVC(data, key = "ID", gamma = 0.005)

Arguments

Value

Examples

See also