R: Affinity Propagation

hanaml.AffinityPropagation {hana.ml.r}

R Documentation

Affinity Propagation

Description

hanaml.AffinityPropagation is a R wrapper for PAL Affinity Propagation algorithm.

Usage

hanaml.AffinityPropagation(conn.context,
                           data,
                           key,
                           features = NULL,
                           affinity,
                           n.clusters,
                           max.iter = NULL,
                           convergence.iter = NULL,
                           damping = NULL,
                           preference = NULL,
                           seed.ratio = NULL,
                           times = NULL,
                           minkowski.power = NULL,
                           thread.ratio = NULL)

Arguments

`conn.context`	`ConnectionContext` Connection to the SAP HANA System
`data`	`DataFrame` DataFrame containing the data.
`key`	`character` Name of the ID column..
`features`	`character or list of characters, optional` Names of the features columns.
`affinity`	`character` Ways to compute the distance between two points. `'manhattan'` `'euclidean'` `'minkowski'` `'chebyshev'` `'standardized.euclidean'` `'cosine'` No default value as it is mandatory.
`n.clusters`	`integer` `0`: Does not adjust Affinity Propagation cluster result. `Non-zero integer`: If Affinity Propagation cluster number is bigger than n.clusters, PAL will merge the result to make the cluster number be the value specified for n.clusters.
`max.iter`	`integer, optional` Maximum number of iterations. Defaults to 500.
`convergence.iter`	`integer, optional` When the clusters keep a steady one for the specified times, the algorithm ends. Defaults to 100.
`damping`	`double, optional` Controls the updating velocity. Value range: (0, 1). Defaults to 0.9.
`preference`	`double, optional` Determines the preference. Value range: [0,1]. Defaults to 0.5.
`seed.ratio`	`double, optional` Select a portion of (seed_ratio * data_number) the input data as seed, where data_number is the row_size of the input data. Value range: (0,1]. If seed_ratio is 1, all the input data will be the seed. Defaults to 1.
`times`	`integer, optional` The sampling times. Only valid when seed_ratio is less than 1 and affinity is 'minkowski'. Defaults to 1.
`minkowski.power`	`integer, optional` The sampling times. Only valid when affinity is 'minkowski'. Defaults to 1.
`thread.ratio`	`numeric, optional` Specifies the ratio of total number of threads that can be used by this function. The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use. Defaults to 0.

Format

R6Class object.

Value

An "AffinityPropagation" object with the following attributes:

labels : DataFrame
Label assigned to each sample,structured as follows:
- ID, record ID.
- CLUSTER_ID, the range is from 0 to n.clusters - 1.
statistics : DataFrame
Statistic value, structured as follows:
- STAT_NAME, Statistic name.
- STAT_VALUE, Statistic value.

Examples

## Not run: 
 Input DataFrame data:
> data$collect()
       ID     V1     V2
   0    1   0.10   0.10
   1    2   0.11   0.10
   2    3   0.10   0.11
   3    4   0.11   0.11
   4    5   0.12   0.11
   5    6   0.11   0.12
   20  21  10.13  10.12
   21  22  10.13  10.13
   22  23  10.13  10.14
   23  24  10.14  10.13

Create a AffinityPropagation instance:
> ap <- hanaml.AffinityPropagation(conn.context = conn,
                                   data = data,
                                   affinity = 'euclidean',
                                   n.clusters = 0L,
                                   max.iter = 500L,
                                   convergence.iter = 100L,
                                   damping = 0.9,
                                   preference = 0.5,
                                   times = 1L,
                                   seed.ratio = 1,
                                   minkowski.power = 0,
                                   thread.ratio = 0)
Expected output:
> ap$labels$collect()
    ID  CLUSTER_ID
0    1           0
1    2           0
2    3           0
3    4           0
4    5           0
5    6           0
...
21  22           1
22  23           1
23  24           1

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]