R: DBSCAN (Density-Based Spatial Clustering of Applications with...

hanaml.DBSCAN {hana.ml.r}

R Documentation

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Description

hanaml.DBSCAN is a R wrapper for PAL DBSCAN algorithm.

Usage

hanaml.DBSCAN(conn.context,
              data = NULL,
              key = NULL,
              features = NULL,
              minpts = NULL,
              eps = NULL,
              thread.ratio = NULL,
              metric = NULL,
              minkowski.power = NULL,
              categorical.variable = NULL,
              category.weights = NULL,
              algorithm = NULL,
              save.model = NULL)

Arguments

`conn.context`	`ConnectionContext` Connection to the SAP HANA System
`data`	`DataFrame` DataFrame containing the data.
`key`	`character` Name of ID column.
`features`	`character or list of characters, optional` Names of the features columns. If is not provided, it defaults to all the non-ID columns.
`minpts`	`integer, optional` The minimum number of points required to form a cluster Note that minpts and eps need to be provided together by user or these two parameters are automatically determined.
`eps`	`double, optional` The scan radius. Note that minpts and eps need to be provided together by user or these two parameters are automatically determined.
`thread.ratio`	`double, optional` Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use. Defaults to -1.
`metric`	`character, optional` Ways to compute the distance between two points. Valid metric options include: `'manhattan'` `'euclidean'` `'minkowski'` `'chebyshev'` `'standardized.euclidean'` `'cosine'` Defaults to "euclidean".
`minkowski.power`	`integer, optional` When minkowski is choosed for "metric", this parameter controls the value of power. Only applicable when metric is 'minkowski'. Defaults to 3.
`categorical.variable`	`character or list of characters, optional` Specifies column(s) in the data that should be treated as categorical. No default value.
`category.weights`	`double, optional` Represents the weight of category attributes. Defaults to 0.707.
`algorithm`	`{"brute.force", "kd.tree"}, optional` Ways to search for neighbours. Defaults to "kd.tree".
`save.model`	`logical, optional` If TRUE, the generated model will be saved. save.model must be TRUE to call. Defaults to TRUE.

Format

R6Class object.

Value

Return a "DBSCAN" objects with the following attributes:

labels : DataFrame
Label assigned to each sample.
model : DataFrame
PMML model. Set to None if no PMML model was requested.

Examples

## Not run: 
 Input DataFrame data:
 > data$collect()
        ID     V1     V2 V3
   0    1   0.10   0.10  B
   1    2   0.11   0.10  A
   2    3   0.10   0.11  C
   3    4   0.11   0.11  B
   4    5   0.12   0.11  A
   5    6   0.11   0.12  E
   ...
   27  28  16.11  16.11  A
   28  29  20.11  20.12  C
   29  30  15.12  15.11  A

 Create a DBSCAN object:

 > DBSCAN <-hanaml.DBSCAN(conn, data, thread.ratio = 0.2,
                          metric = "Manhattan")

 expected output:
 > DBSCAN$labels$Collect()
             ID    CLUSTER.ID
       1      1          0
       2      2          0
       3      3          0
       4      4          0
       5      5          0
       ...
       28    28         -1
       29    29         -1
       30    30         -1

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]