SlightSilhouette

hana_ml.algorithms.pal.clustering.SlightSilhouette(data, features=None, label=None, distance_level=None, minkowski_power=None, normalization=None, thread_number=None, categorical_variable=None, category_weights=None)

Silhouette refers to a method used to validate the cluster of data which provides a succinct graphical representation of how well each object lies within its cluster. SAP HNAN PAL provides a light version of silhouette called slight silhouette.

Parameters
dataDataFrame

DataFrame containing the data.

featuresa list of str, optional

The names of feature columns.

If features is not provided, it defaults to all non-label columns.

label: str, optional

The name of the label column which indicate the cluster id.

If label is not provided, it defaults to the last column of data.

distance_level{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'cosine'}, optional

Specifies the method for computing the distance between a data point and the cluster center. The 'cosine' method is only valid when the accelerated = False condition is applied.

Defaults to 'euclidean'.

minkowski_powerfloat, optional

Determines the power to be used in the Minkowski distance calculation. It is only applicable when the distance_level parameter is set to 'minkowski'.

Defaults to 3.0.

normalization{'no', 'l1_norm', 'min_max'}, optional

Specifies the type of normalization to be applied to the data points.

  • 'no': No normalization is applied.

  • 'l1_norm': This applies L1 normalization. For each point X (x1, x2, ..., xn), the normalized value will be X'(x1 /S,x2 /S,...,xn /S), where S = |x1|+|x2|+...|xn|.

  • 'min_max': The Min-Max normalization method is applied. For each column C, get the min and max value of C, and then C[i] = (C[i]-min)/(max-min).

Defaults to 'no'.

thread_numberint, optional

Number of threads.

Defaults to 1.

categorical_variablestr or a list of str, optional

Specifies which INTEGER columns should be treated as categorical, with all other INTEGER columns treated as continuous.

No default value.

category_weightsfloat, optional

Represents the weight of category attributes.

Defaults to 0.707.

Returns
DataFrame

A DataFrame containing the validation value of Slight Silhouette.

Examples

Input data df:

>>> df.collect()
    V000 V001 V002 CLUSTER
0    0.5    A  0.5       0
1    1.5    A  0.5       0
...
18  15.5    D  1.5       3
19  15.7    A  1.6       3

Call the function:

>>> res = SlightSilhouette(data=df, label="CLUSTER")

Result:

>>> res.collect()
  VALIDATE_VALUE
0      0.9385944