Properties that can be configured for the HANA Support Vector Machine algorithm.
Support Vector Machines (SVMs) refer to a family of supervised learning models using the concept of support vector. Compared with many other supervised learning models, SVMs have the advantages in that the models produced by SVMs can be either linear or non-linear, where the latter is realized by a technique called Kernel Trick.
Like most supervised models, SVMs have training and testing phases. In the training phase, a function f(x):->y where f(∙) is a function (can be non-linear) mapping a sample onto a TARGET, is learnt. The training set consists of pairs denoted by {xi, yi}, where x denotes a sample represented by several attributes, and y denotes a TARGET (supervised information). In the testing phase, the learnt f(∙) is further used to map a sample with unknown TARGET onto its predicted TARGET.
Classification is one of the most frequent tasks in many fields including machine learning, data mining, computer vision, and business data analysis. Compared with linear classifiers like logistic regression, SVC is able to produce non-linear decision boundary, which leads to better accuracy on some real world dataset. In classification scenario, f(∙) refers to decision function, and a TARGET refers to a "label" represented by a real number.
SVR is another method for regression analysis. Compared with classical linear regression methods like least square regression, the regression function in SVR can be non-linear. In regression scenario, f(∙) refers to regression function, and TARGET refers to "response" represented by a real number.
This implements a pairwise "learning to rank" algorithm which learns a ranking function from several sets (distinguished by Query ID) of ranked samples. In the scenario of ranking, f(∙) refers to ranking function, and TARGET refers to score, according to which the final ranking is made. For pairwise ranking, f(∙) is learnt so that the pairwise relationship expressing the rank of the samples within each set is considered.
Because non-linearity is realized by Kernel Trick, besides the datasets, the kernel type and parameters should be specified as well.
| Property | Description |
|---|---|
| Algorithm Type | Select the type of analysis the algorithm should perform.
|
| Output Mode | Select the mode in which you want to use the output of this algorithm. |
| Features | Select the input columns with which you want to perform the analysis. |
| Target Variable | Select the target column on which you want to perform the analysis. |
| Query ID | Select a Query ID column for Ranking. |
| Missing Values | Select the method for handling missing values. Possible
values:
|
| Kernel Type | Select the kernel type. |
| Gamma | Enter the gamma coefficient for the RBF kernel. |
| Maximum Margin | Enter a trade-off value that you want to consider between the training error and margin. |
| Degree | Enter a degree for polynomial kernel. The default value is 3. |
| Linear Coefficient | Enter a value for linear coefficient. |
| Coefficient Constant | Enter a value for coefficient constant. |
| Cross Validation | Select this option to use cross validation for calculation. |
| Normalization Type | Select the type of normalization. |
| Number of Threads | Enter the number of threads the algorithm should use for execution. The default value is 1. |
| Predicted Column Name | Enter a name for the newly-created column that contains predicted values. |