hanaml.GeoDBSCAN.Rd
hanaml.GeoDBSCAN is a R wrapper for SAP HANA PAL GeoDBSCAN algorithm.
hanaml.GeoDBSCAN(
data = NULL,
key = NULL,
features = NULL,
minpts = NULL,
eps = NULL,
thread.ratio = NULL,
metric = NULL,
minkowski.power = NULL,
algorithm = NULL,
save.model = NULL
)
DataFrame
DataFrame containting the data.
character
Name of ID column.
character, optional
Name of the feature column. GeoDBSCAN only supports one feature.
If is not provided, it defaults to first non-ID columns.
integer, optional
The minimum number of points required to form a cluster
Note that
minpts and eps need to be provided together by user or these
two parameters are automatically determined.
double, optional
The scan radius.
Note that minpts and eps need to be provided together
by user or these two parameters are automatically determined.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads. Values between 0 and 1 will use up to
that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads
used is then be heuristically determined.
Defaults to -1.
character, optional
Ways to compute the distance between two points. Valid metric options include:
'manhattan'
'euclidean'
'minkowski'
'chebyshev'
'standardized.euclidean'
'cosine'
Defaults to "euclidean".
integer, optional
When minkowski is choosed for "metric", this parameter
controls the value of power.
Only applicable when metric is 'minkowski'.
Defaults to 3.
{"brute.force", "kd.tree"}, optional
Ways to search for neighbours.
Defaults to "kd.tree".
logical, optional
If TRUE, the generated model will be saved.
save.model must be TRUE to call.
Defaults to TRUE.
An R6 object of class "GeoDBSCAN" with the following attributes and methods:
Attributes
labels : DataFrame
Label assigned to each sample. -1 means the point is labeled as noise.
model : DataFrame
PMML model.
Set to NULL if no PMML model was requested.
Methods
CreateModelState(model=NULL, algorithm=NULL, func=NULL, state.description="ModelState", force=FALSE)
Usage:
> gdbs <- hanaml.GeoDBSCAN(data=df, key="ID")
> gdbs$CreateModelState()
Arguments:
model: DataFrame
DataFrame containing the model for parsing.
Defaults to self$model
.
algorithm: character
Specifies the PAL algorithm associated with model
.
Defaults to self$pal.algorithm
.
func: character
Specifies the functionality for Unified Classification/Regression.
Valid only for object instance of R6Class "UnifiedClassification" or "UnifiedRegression".
Defaults to self$func
.
state.description: character
A summary string for the generated model state.
Defaults to "ModelState".
force: logic
Specifies whether or not the replace existing state for model
.
Defaults to FALSE.
After calling this method, an attribute state
that contains the parsed info for model
shall be assigned
to the corresponding R6 object.
DeleteModelState(state=NULL)
Usage:
Assuming we have trained a hanaml
model and created its model state, like the following:
> gdbs <- hanaml.GeoDBSCAN(data=df)
> gdbs$CreateModelState()
After using the model state for real-time scoring, we can delete the state by calling:
> gdbs$DelateModelState()
Arguments:
state: DataFrame
DataFrame containing the state info.
Defaults to self$state
.
After calling this method, the specified model state shall be cleaned up and associated memory be released.
In SAP HANA, the test table PAL_GEO_DBSCAN_DATA_TBL can be created by the following SQL:
CREATE COLUMN TABLE PAL_GEO_DBSCAN_DATA_TBL (
"ID" INTEGER,
"POINT" ST_GEOMETRY);
Input DataFrame for clustering:
> data$Collect()
ID POINT
1 1 SRID=0;POINT (0.1 0.1)
2 2 SRID=0;POINT (0.11 0.1)
3 3 SRID=0;POINT (0.1 0.11)
4 4 SRID=0;POINT (0.11 0.11)
5 5 SRID=0;POINT (0.12 0.11)
......
28 28 SRID=0;POINT (16.11 16.11)
29 29 SRID=0;POINT (20.11 20.12)
30 30 SRID=0;POINT (15.12 15.11)
Call the function:
> GeoDBSCAN <- hanaml.GeoDBSCAN(data,
key = "ID",
thread.ratio = 0.2,
metric = "Manhattan")
Output:
> DBSCAN$labels$Collect()
ID CLUSTER.ID
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
......
28 28 -1
29 29 -1
30 30 -1