Silhouette refers to a method used to validate the cluster of data. PAL provides a light version of silhouette called Slight Silhouette. hanaml.SlightSilhouette is an R wrapper for this light version silhouette method.

hanaml.SlightSilhouette(
  data,
  features = NULL,
  label = NULL,
  distance.level = NULL,
  minkowski.power = NULL,
  normalization = NULL,
  thread.number = NULL,
  categorical.variable = NULL,
  category.weights = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

features

character or list of characters, optional
Names of features columns.
If is not provided, it defaults to all non-key columns of data.

label

character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.

distance.level

character, optional
Specifies how to compute the distance between the item and the cluster center.

  • "manhattan"

  • "euclidean"

  • "minkowski"

  • "chebyshev"

Defaults to "euclidean".

minkowski.power

double, optional
When Minkowski distance is used, this parameter controls the value of power.
Only valid when distance.level is "minkowski".
Defaults to 3.0.

normalization

character, optional
Specifies the normalization type:

  • 'no': no normalization.

  • 'l1.norm': For each point X = (x1,x2,...,xn), the normalized value will be X' = (x1/S,x2/S,...,xn/S), where S = |x1|+|x2|+...|xn|.

  • min.max: Yes, for each column C, get the min and max value of C, and then C[i] = (C[i]-min)/(max-min).

Defaults to "no".

thread.number

integer, optional
Number of threads.
Defaults to 1.

categorical.variable

character or list/vector of characters, optional
Indicates features should be treated as categorical variable.
The default behavior is dependent on what input is given:

  • "VARCHAR" and "NVARCHAR": categorical

  • "INTEGER" and "DOUBLE": continuous.

VALID only for variables of "INTEGER" type, omitted otherwise.
No default value.

category.weights

double, optional
Represents the weight of category attributes.
Defaults to 0.707.

Value

DataFrame containing the validation value of Slight Silhouette.

Examples

Input DataFrame data:


> data$Collect()
   V000 V001 V002 CLUSTER
1   0.5    A  0.5       0
2   1.5    A  0.5       0
3   1.5    A  1.5       0
4   0.5    A  1.5       0
5   1.1    B  1.2       0
6   0.5    B 15.5       1
7   1.5    B 15.5       1
8   1.5    B 16.5       1
9   0.5    B 16.5       1
10  1.2    C 16.1       1
11 15.5    C 15.5       2
12 16.5    C 15.5       2
13 16.5    C 16.5       2
14 15.5    C 16.5       2
15 15.6    D 16.2       2
16 15.5    D  0.5       3
17 16.5    D  0.5       3
18 16.5    D  1.5       3
19 15.5    D  1.5       3
20 15.7    A  1.6       3
 

Call the function:


> res <- hanaml.SlightSilhouette(data,
                                 label = "CLUSTER",
                                 features = c("V000","V001","V002"),
                                 distance.level = "euclidean",
                                 normalization = "no",
                                 category.weights = 0.7,
                                 categorical.variable = "V001")

Output:


> print(res)
  VALIDATE_VALUE
1      0.9385944