hanaml.GeoDBSCAN.Rdhanaml.GeoDBSCAN is a R wrapper for SAP HANA PAL GeoDBSCAN algorithm.
hanaml.GeoDBSCAN( data = NULL, key = NULL, features = NULL, minpts = NULL, eps = NULL, thread.ratio = NULL, metric = NULL, minkowski.power = NULL, algorithm = NULL, save.model = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| minpts |
|
| eps |
|
| thread.ratio |
|
| metric |
Defaults to "euclidean". |
| minkowski.power |
|
| algorithm |
|
| save.model |
|
A "GeoDBSCAN" object with the following attributes:
labels : DataFrame
Label assigned to each sample. -1 means the point is labeled as noise.
model : DataFrame
PMML model.
Set to NULL if no PMML model was requested.
In SAP HANA, the test table PAL_GEO_DBSCAN_DATA_TBL can be created by the following SQL:
CREATE COLUMN TABLE PAL_GEO_DBSCAN_DATA_TBL (
"ID" INTEGER,
"POINT" ST_GEOMETRY);
Input DataFrame for clustering:
> data$Collect() ID POINT 1 1 SRID=0;POINT (0.1 0.1) 2 2 SRID=0;POINT (0.11 0.1) 3 3 SRID=0;POINT (0.1 0.11) 4 4 SRID=0;POINT (0.11 0.11) 5 5 SRID=0;POINT (0.12 0.11) ...... 28 28 SRID=0;POINT (16.11 16.11) 29 29 SRID=0;POINT (20.11 20.12) 30 30 SRID=0;POINT (15.12 15.11)
Call the function:
> GeoDBSCAN <- hanaml.GeoDBSCAN(data,
key = "ID",
thread.ratio = 0.2,
metric = "Manhattan")
Output:
> DBSCAN$labels$Collect()
ID CLUSTER.ID
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
......
28 28 -1
29 29 -1
30 30 -1