hanaml.Knn {hana.ml.r}R Documentation

K-Nearest Neighbor(KNN)

Description

hanaml.Knn is a R wrapper for PAL Knn.

Usage

hanaml.Knn (conn.context, data = NULL, key = NULL, features = NULL,
           label = NULL, n.neighbors = NULL, thread.ratio = NULL,
           attribute.num = NULL, voting.type = NULL,
           stat.info = TRUE, metric = NULL, minkowski.power = NULL,
           algorithm = NULL)

Arguments

conn.context

ConnectionContext
The connection to the SAP HANA system.

data

DataFrame
DataFrame containing the data.

key

character
Name of the ID column of data.

features

list of character, optional
Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.

label

character
Name of the column in data that specifies the dependent variable.

n.neighbors

integer, optional
Number of nearest neighbors.#' Defaults to 1.

thread.ratio

double, optional
Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use up to that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use.

Defaults to 0.

attribute.num

integer, optional
Number of attributes. No default value.

voting.type

("majority", "distance-weighted"), optional
Method used to vote for the most frequent label of the K nearest neighbors.

Defaults to 'distance-weighted'.

stat.info

logical, optional
Controls whether to return a statistic information table containing the distance between each point in the prediction set and its k nearest neighbors in the training set. If TRUE, the table will be returned.

Defaults to TRUE.

metric

("manhattan", "euclidean", "minkowski", "chebyshev"), optional
Ways to compute the distance between data points.

Defaults to 'euclidean'.

minkowski.power

double, optional
When 'Minkowski' is used for metric, this parameter controls the value of power.

Defaults to 3.0.

algorithm

("brute-force", "kd-tree"), optional
Algorithm used to compute the nearest neighbors.

Defaults to 'brute-force'.

Format

R6Class object.

Details

K-Nearest Neighbor (KNN) is a memory-based classification or regression method with no explicit training phase. It assumes similar instances should have similar labels or values.

Value

A "Knn" object.

Examples

## Not run: 
Training data:

> df$Collect()
    ID      X1      X2  TYPE
 0   0     1.0     1.0     2
 1   1    10.0    10.0     3
 2   2    10.0    11.0     3
 3   3    10.0    10.0     3
 4   4  1000.0  1000.0     1
 5   5  1000.0  1001.0     1
 6   6  1000.0   999.0     1
 7   7   999.0   999.0     1
 8   8   999.0  1000.0     1
 9   9  1000.0  1000.0     1

> knn <- hanaml.Knn(connection.context, df, key="ID", features=list("X1", "X2"),
                   label="TYPE" n.neighbors=3, voting.type="majority",
                   thread.ratio=0.1, stat.info=FALSE)

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]