Precomputed Distance Matrix as input data in UnifiedClustering
It is able to do some clustering algorithms in the unified clustering by pre-computed distance matrix input data. To do unified clustering with pre-computed distance matrix, it must be either upper or lower triangular, which means that the expanded shape needs to be N samples * N samples and the distance value of the pair of (i,j) or (j,i) must be unique.
Currently, unified clustering with pre-computed distance matrix is only provided for K-Medoids. In addition, massive mode does not support this feature. Hence, if you want to use precomputed distance matrix as input data in fit() and predict(), please use the input dataframe in the following structure:
Input DataFrame Structure
1st column:INTEGER, VARCHAR, or NVARCHAR, Left Point.
2nd column:INTEGER, VARCHAR, or NVARCHAR, Right Point, the type should be the same as the left point type.
3rd column:DOUBLE, Distance.
The parameters for precomputed distance matrix as input data for K-Medoids:
Parameters
- n_clusters : int
Number of groups.
- tol : float, optional
Convergence threshold for exiting iterations.
Defaults to 1.0e-6.
- init : {'first_k', 'replace', 'no_replace'}, optional
Controls how the initial centers are selected:
'first_k': First k observations.
'replace': Random with replacement.
'no_replace': Random without replacement.
Defaults to 'first_k'.
random_seed : int, optional
Indicates the seed used to initialize the random number generator. It can be set to 0 or a positive value.
0: Uses the system time;
Not 0: Uses the specified seed.
Defaults to -1.
- max_iter : int, optional
Max iterations.
Defaults to 100.
thread_ratio : float, optional
Controls the proportion of available threads to use.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.
Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.
precalculated_distance : bool, optional
State of pre-computed distance matrix as input data:
False: invalid.
True: valid.