SlightSilhouette

hana_ml.algorithms.pal.clustering.SlightSilhouette(data, features=None, label=None, distance_level=None, minkowski_power=None, normalization=None, thread_number=None, categorical_variable=None, category_weights=None)

Silhouette refers to a method used to validate the cluster of data which provides a succinct graphical representation of how well each object lies within its cluster. SAP HNAN PAL provides a light version of silhouette called slight silhouette.

Parameters:

dataDataFrame

DataFrame containing the data.

featuresa list of str, optional

The names of feature columns.

If features is not provided, it defaults to all non-label columns.

label: str, optional

The name of the label column which indicate the cluster id.

If label is not provided, it defaults to the last column of data.

distance_level{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'cosine'}, optional

Specifies the method for computing the distance between a data point and the cluster center. The 'cosine' method is only valid when the accelerated = False condition is applied.

Defaults to 'euclidean'.

minkowski_powerfloat, optional

Determines the power to be used in the Minkowski distance calculation. It is only applicable when the distance_level parameter is set to 'minkowski'.

Defaults to 3.0.

normalization{'no', 'l1_norm', 'min_max'}, optional

Specifies the type of normalization to be applied to the data points.

'no': No normalization is applied.
'l1_norm': This applies L1 normalization. For each point X (x₁, x₂, ..., x_n), the normalized value will be X'(x₁ /S,x₂ /S,...,x_n /S), where S = |x₁|+|x₂|+...|x_n|.
'min_max': The Min-Max normalization method is applied. For each column C, get the min and max value of C, and then C[i] = (C[i]-min)/(max-min).

Defaults to 'no'.

thread_numberint, optional

Number of threads.

Defaults to 1.

categorical_variablestr or a list of str, optional

Specifies which INTEGER columns should be treated as categorical, with all other INTEGER columns treated as continuous.

No default value.

category_weightsfloat, optional

Represents the weight of category attributes.

Defaults to 0.707.

Returns:

DataFrame: A DataFrame containing the validation value of Slight Silhouette.

Examples

Input data df:

>>> df.collect()
    V000 V001 V002 CLUSTER
0    0.5    A  0.5       0
1    1.5    A  0.5       0
...
18  15.5    D  1.5       3
19  15.7    A  1.6       3

Call the function:

>>> res = SlightSilhouette(data=df, label="CLUSTER")

Result:

>>> res.collect()
  VALIDATE_VALUE
0      0.9385944