SlightSilhouette
- hana_ml.algorithms.pal.clustering.SlightSilhouette(data, features=None, label=None, distance_level=None, minkowski_power=None, normalization=None, thread_number=None, categorical_variable=None, category_weights=None)
Silhouette refers to a method used to validate the cluster of data which provides a succinct graphical representation of how well each object lies within its cluster. SAP HNAN PAL provides a light version of silhouette called slight silhouette.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- featuresa list of str, optional
The names of feature columns.
If
features
is not provided, it defaults to all non-label columns.- label: str, optional
The name of the label column which indicate the cluster id.
If
label
is not provided, it defaults to the last column of data.- distance_level{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'cosine'}, optional
Specifies the method for computing the distance between a data point and the cluster center. The 'cosine' method is only valid when the
accelerated = False
condition is applied.Defaults to 'euclidean'.
- minkowski_powerfloat, optional
Determines the power to be used in the Minkowski distance calculation. It is only applicable when the
distance_level
parameter is set to 'minkowski'.Defaults to 3.0.
- normalization{'no', 'l1_norm', 'min_max'}, optional
Specifies the type of normalization to be applied to the data points.
'no': No normalization is applied.
'l1_norm': This applies L1 normalization. For each point X (x1, x2, ..., xn), the normalized value will be X'(x1 /S,x2 /S,...,xn /S), where S = |x1|+|x2|+...|xn|.
'min_max': The Min-Max normalization method is applied. For each column C, get the min and max value of C, and then C[i] = (C[i]-min)/(max-min).
Defaults to 'no'.
- thread_numberint, optional
Number of threads.
Defaults to 1.
- categorical_variablestr or a list of str, optional
Specifies which INTEGER columns should be treated as categorical, with all other INTEGER columns treated as continuous.
No default value.
- category_weightsfloat, optional
Represents the weight of category attributes.
Defaults to 0.707.
- Returns:
- DataFrame
A DataFrame containing the validation value of Slight Silhouette.
Examples
Input data df:
>>> df.collect() V000 V001 V002 CLUSTER 0 0.5 A 0.5 0 1 1.5 A 0.5 0 ... 18 15.5 D 1.5 3 19 15.7 A 1.6 3
Call the function:
>>> res = SlightSilhouette(data=df, label="CLUSTER")
Result:
>>> res.collect() VALIDATE_VALUE 0 0.9385944