R: K-Nearest Neighbor(KNN)

hanaml.Knn {hana.ml.r}

R Documentation

K-Nearest Neighbor(KNN)

Description

hanaml.Knn is a R wrapper for PAL Knn.

Usage

hanaml.Knn (conn.context, data = NULL, key = NULL, features = NULL,
           label = NULL, n.neighbors = NULL, thread.ratio = NULL,
           attribute.num = NULL, voting.type = NULL,
           stat.info = TRUE, metric = NULL, minkowski.power = NULL,
           algorithm = NULL)

Arguments

`conn.context`	`ConnectionContext` The connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character` Name of the ID column of data.
`features`	`list of character, optional` Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.
`label`	`character` Name of the column in data that specifies the dependent variable.
`n.neighbors`	`integer, optional` Number of nearest neighbors.#' Defaults to 1.
`thread.ratio`	`double, optional` Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use up to that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use. Defaults to 0.
`attribute.num`	`integer, optional` Number of attributes. No default value.
`voting.type`	`("majority", "distance-weighted"), optional` Method used to vote for the most frequent label of the K nearest neighbors. Defaults to 'distance-weighted'.
`stat.info`	`logical, optional` Controls whether to return a statistic information table containing the distance between each point in the prediction set and its k nearest neighbors in the training set. If TRUE, the table will be returned. Defaults to TRUE.
`metric`	`("manhattan", "euclidean", "minkowski", "chebyshev"), optional` Ways to compute the distance between data points. Defaults to 'euclidean'.
`minkowski.power`	`double, optional` When 'Minkowski' is used for metric, this parameter controls the value of power. Defaults to 3.0.
`algorithm`	`("brute-force", "kd-tree"), optional` Algorithm used to compute the nearest neighbors. Defaults to 'brute-force'.

Format

R6Class object.

Details

K-Nearest Neighbor (KNN) is a memory-based classification or regression method with no explicit training phase. It assumes similar instances should have similar labels or values.

Value

A "Knn" object.

Examples

## Not run: 
Training data:

> df$Collect()
    ID      X1      X2  TYPE
 0   0     1.0     1.0     2
 1   1    10.0    10.0     3
 2   2    10.0    11.0     3
 3   3    10.0    10.0     3
 4   4  1000.0  1000.0     1
 5   5  1000.0  1001.0     1
 6   6  1000.0   999.0     1
 7   7   999.0   999.0     1
 8   8   999.0  1000.0     1
 9   9  1000.0  1000.0     1

> knn <- hanaml.Knn(connection.context, df, key="ID", features=list("X1", "X2"),
                   label="TYPE" n.neighbors=3, voting.type="majority",
                   thread.ratio=0.1, stat.info=FALSE)

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]