hanaml.AffinityPropagation.Rd
hanaml.AffinityPropagation is a R wrapper for SAP HANA PAL Affinity Propagation algorithm.
hanaml.AffinityPropagation(
data,
key,
features = NULL,
affinity,
n.clusters,
max.iter = NULL,
convergence.iter = NULL,
damping = NULL,
preference = NULL,
seed.ratio = NULL,
times = NULL,
minkowski.power = NULL,
thread.ratio = NULL
)
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character or list of characters, optional
Names of features columns.
If is not provided, it defaults to all non-key columns of data
.
character
Ways to compute the distance between two points.
'manhattan'
'euclidean'
'minkowski'
'chebyshev'
'standardized.euclidean'
'cosine'
No default value as it is mandatory.
integer
0
: Does not adjust Affinity Propagation cluster result.
Non-zero integer
: If Affinity Propagation cluster number is bigger
than n.clusters, PAL will merge the result to make the cluster number
be the value specified for n.clusters.
integer, optional
Maximum number of iterations.
Defaults to 500.
integer, optional
When the clusters keep a steady one for the specified times,
the algorithm ends.
Defaults to 100.
double, optional
Controls the updating velocity. Value range: (0, 1).
Defaults to 0.9.
double, optional
Determines the preference. Value range: [0,1].
Defaults to 0.5.
double, optional
Select a portion of (seed.ratio
* data_number) the
input data as seed, where data_number is the row-size of the
input data. Value range: (0,1]. If seed.ratio
is 1, all the
input data will be the seed.
Defaults to 1.
integer, optional
The sampling times. Only valid when seed.ratio
is less than 1 and
affinity is 'minkowski'.
Defaults to 3.
integer, optional
The sampling times. Only valid when affinity is 'minkowski'.
Defaults to 1.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
An "AffinityPropagation" object with the following attributes:
labels : DataFrame
Label assigned to each sample,structured as follows:
ID : record ID.
CLUSTER_ID : the range is from 0 to n.clusters - 1.
statistics : DataFrame
Statistic value, structured as follows:
STAT_NAME : Statistic name.
STAT_VALUE : Statistic value.
Input DataFrame data:
> data$Collect()
ID V1 V2
1 1 0.10 0.10
2 2 0.11 0.10
3 3 0.10 0.11
4 4 0.11 0.11
5 5 0.12 0.11
6 6 0.11 0.12
21 21 10.13 10.12
22 22 10.13 10.13
23 23 10.13 10.14
24 24 10.14 10.13
Call the function:
> ap <- hanaml.AffinityPropagation(data = data,
key = "ID",
affinity = "euclidean",
n.clusters = 0L,
max.iter = 500L,
convergence.iter = 100L,
damping = 0.9,
preference = 0.5,
times = 1L,
seed.ratio = 1,
minkowski.power = 0,
thread.ratio = 0)
Output:
> ap$labels$collect()
ID CLUSTER_ID
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
......
22 22 1
23 23 1
24 24 1