HANA Support Vector Machine

Properties that can be configured for the HANA Support Vector Machine algorithm.

Syntax

Support Vector Machines (SVMs) refer to a family of supervised learning models using the concept of support vector. Compared with many other supervised learning models, SVMs have the advantages in that the models produced by SVMs can be either linear or non-linear, where the latter is realized by a technique called Kernel Trick.

Like most supervised models, SVMs have training and testing phases. In the training phase, a function f(x):->y where f(∙) is a function (can be non-linear) mapping a sample onto a TARGET, is learnt. The training set consists of pairs denoted by {xi, yi}, where x denotes a sample represented by several attributes, and y denotes a TARGET (supervised information). In the testing phase, the learnt f(∙) is further used to map a sample with unknown TARGET onto its predicted TARGET.

In the current implementation in PAL, SVMs can be used for the following three tasks:
  • Support Vector Classification (SVC)

    Classification is one of the most frequent tasks in many fields including machine learning, data mining, computer vision, and business data analysis. Compared with linear classifiers like logistic regression, SVC is able to produce non-linear decision boundary, which leads to better accuracy on some real world dataset. In classification scenario, f(∙) refers to decision function, and a TARGET refers to a "label" represented by a real number.

  • Support Vector Regression (SVR)

    SVR is another method for regression analysis. Compared with classical linear regression methods like least square regression, the regression function in SVR can be non-linear. In regression scenario, f(∙) refers to regression function, and TARGET refers to "response" represented by a real number.

  • Support Vector Ranking

    This implements a pairwise "learning to rank" algorithm which learns a ranking function from several sets (distinguished by Query ID) of ranked samples. In the scenario of ranking, f(∙) refers to ranking function, and TARGET refers to score, according to which the final ranking is made. For pairwise ranking, f(∙) is learnt so that the pairwise relationship expressing the rank of the samples within each set is considered.

Because non-linearity is realized by Kernel Trick, besides the datasets, the kernel type and parameters should be specified as well.

HANA Support Vector Machine Properties
Table 1: Algorithm Properties
Property Description
Algorithm Type Select the type of analysis the algorithm should perform.
  • Classification
  • Regression
  • Ranking
Output Mode Select the mode in which you want to use the output of this algorithm.
Features Select the input columns with which you want to perform the analysis.
Target Variable Select the target column on which you want to perform the analysis.
Query ID Select a Query ID column for Ranking.
Missing Values Select the method for handling missing values.
Possible values:
  • Ignore: Algorithm skips the records containing missing values in the independent or dependent columns.
  • Keep: Algorithm retains the records containing missing values during calculation.
Kernel Type Select the kernel type.
Gamma Enter the gamma coefficient for the RBF kernel.
Maximum Margin Enter a trade-off value that you want to consider between the training error and margin.
Degree Enter a degree for polynomial kernel. The default value is 3.
Linear Coefficient Enter a value for linear coefficient.
Coefficient Constant Enter a value for coefficient constant.
Cross Validation Select this option to use cross validation for calculation.
Normalization Type Select the type of normalization.
Number of Threads Enter the number of threads the algorithm should use for execution. The default value is 1.
Predicted Column Name Enter a name for the newly-created column that contains predicted values.