hanaml.DBSCAN {hana.ml.r}R Documentation

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Description

hanaml.DBSCAN is a R wrapper for PAL DBSCAN algorithm.

Usage

hanaml.DBSCAN(conn.context,
              data = NULL,
              key = NULL,
              features = NULL,
              minpts = NULL,
              eps = NULL,
              thread.ratio = NULL,
              metric = NULL,
              minkowski.power = NULL,
              categorical.variable = NULL,
              category.weights = NULL,
              algorithm = NULL,
              save.model = NULL)

Arguments

conn.context

ConnectionContext
Connection to the SAP HANA System

data

DataFrame
DataFrame containing the data.

key

character
Name of ID column.

features

character or list of characters, optional
Names of the features columns.
If is not provided, it defaults to all the non-ID columns.

minpts

integer, optional
The minimum number of points required to form a cluster
Note that
minpts and eps need to be provided together by user or these two parameters are automatically determined.

eps

double, optional
The scan radius.
Note that minpts and eps need to be provided together by user or these two parameters are automatically determined.

thread.ratio

double, optional
Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to -1.

metric

character, optional
Ways to compute the distance between two points. Valid metric options include:

  • 'manhattan'

  • 'euclidean'

  • 'minkowski'

  • 'chebyshev'

  • 'standardized.euclidean'

  • 'cosine'

Defaults to "euclidean".

minkowski.power

integer, optional
When minkowski is choosed for "metric", this parameter controls the value of power. Only applicable when metric is 'minkowski'.
Defaults to 3.

categorical.variable

character or list of characters, optional
Specifies column(s) in the data that should be treated as categorical. No default value.

category.weights

double, optional
Represents the weight of category attributes.
Defaults to 0.707.

algorithm

{"brute.force", "kd.tree"}, optional
Ways to search for neighbours.
Defaults to "kd.tree".

save.model

logical, optional
If TRUE, the generated model will be saved. save.model must be TRUE to call.
Defaults to TRUE.

Format

R6Class object.

Value

Return a "DBSCAN" objects with the following attributes:

See Also

predict.DBSCAN

Examples

## Not run: 
 Input DataFrame data:
 > data$collect()
        ID     V1     V2 V3
   0    1   0.10   0.10  B
   1    2   0.11   0.10  A
   2    3   0.10   0.11  C
   3    4   0.11   0.11  B
   4    5   0.12   0.11  A
   5    6   0.11   0.12  E
   ...
   27  28  16.11  16.11  A
   28  29  20.11  20.12  C
   29  30  15.12  15.11  A

 Create a DBSCAN object:

 > DBSCAN <-hanaml.DBSCAN(conn, data, thread.ratio = 0.2,
                          metric = "Manhattan")

 expected output:
 > DBSCAN$labels$Collect()
             ID    CLUSTER.ID
       1      1          0
       2      2          0
       3      3          0
       4      4          0
       5      5          0
       ...
       28    28         -1
       29    29         -1
       30    30         -1

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]