Affinity Propagation — hanaml.AffinityPropagation • hana.ml.r

hanaml.AffinityPropagation is a R wrapper for SAP HANA PAL Affinity Propagation algorithm.

hanaml.AffinityPropagation(
  data,
  key,
  features = NULL,
  affinity,
  n.clusters,
  max.iter = NULL,
  convergence.iter = NULL,
  damping = NULL,
  preference = NULL,
  seed.ratio = NULL,
  times = NULL,
  minkowski.power = NULL,
  thread.ratio = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
key	`character` Name of the ID column.
features	`character or list of characters, optional` Names of features columns. If is not provided, it defaults to all non-key columns of `data`.
affinity	`character` Ways to compute the distance between two points. `'manhattan'` `'euclidean'` `'minkowski'` `'chebyshev'` `'standardized.euclidean'` `'cosine'` No default value as it is mandatory.
n.clusters	`integer` `0`: Does not adjust Affinity Propagation cluster result. `Non-zero integer`: If Affinity Propagation cluster number is bigger than n.clusters, PAL will merge the result to make the cluster number be the value specified for n.clusters.
max.iter	`integer, optional` Maximum number of iterations. Defaults to 500.
convergence.iter	`integer, optional` When the clusters keep a steady one for the specified times, the algorithm ends. Defaults to 100.
damping	`double, optional` Controls the updating velocity. Value range: (0, 1). Defaults to 0.9.
preference	`double, optional` Determines the preference. Value range: [0,1]. Defaults to 0.5.
seed.ratio	`double, optional` Select a portion of (`seed.ratio` * data_number) the input data as seed, where data_number is the row-size of the input data. Value range: (0,1]. If `seed.ratio` is 1, all the input data will be the seed. Defaults to 1.
times	`integer, optional` The sampling times. Only valid when `seed.ratio` is less than 1 and affinity is 'minkowski'. Defaults to 3.
minkowski.power	`integer, optional` The sampling times. Only valid when affinity is 'minkowski'. Defaults to 1.
thread.ratio	`double, optional` Controls the proportion of available threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored. Defaults to 0.

Value

An "AffinityPropagation" object with the following attributes:

labels : DataFrame
Label assigned to each sample,structured as follows:
- ID record ID.
- CLUSTER_ID the range is from 0 to n.clusters - 1.
statistics : DataFrame
Statistic value, structured as follows:
- STAT_NAME Statistic name.
- STAT_VALUE Statistic value.

Examples

Input DataFrame data:

> data$Collect()
    ID     V1     V2
1    1   0.10   0.10
2    2   0.11   0.10
3    3   0.10   0.11
4    4   0.11   0.11
5    5   0.12   0.11
6    6   0.11   0.12
21  21  10.13  10.12
22  22  10.13  10.13
23  23  10.13  10.14
24  24  10.14  10.13

Call the function:

> ap <- hanaml.AffinityPropagation(data = data,
                                   key = "ID",
                                   affinity = "euclidean",
                                   n.clusters = 0L,
                                   max.iter = 500L,
                                   convergence.iter = 100L,
                                   damping = 0.9,
                                   preference = 0.5,
                                   times = 1L,
                                   seed.ratio = 1,
                                   minkowski.power = 0,
                                   thread.ratio = 0)

Output:

> ap$labels$collect()
    ID  CLUSTER_ID
1    1           0
2    2           0
3    3           0
4    4           0
5    5           0
6    6           0
......
22  22           1
23  23           1
24  24           1