trustworthiness¶
- hana_ml.algorithms.pal.decomposition.trustworthiness(data, embedding, distance_level=None, minkowski_power=None, embedded_distance_level=None, embedded_minkowski_power=None, distance_method=None, embedded_knn_method=None, max_neighbors_trustworthiness=None, thread_ratio=None)¶
Calculate the trustworthiness of the embedding.
- Parameters
- dataDataFrame
Input data.
- embeddingDataFrame
Embedded data.
- distance_level{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'standardized_euclidean', 'cosine'}, optional
The distance level determines the distance metric used in the original high dimensional space. The following distance levels are available:
'manhattan' : Manhattan distance
'euclidean' : Euclidean distance
'minkowski' : Minkowski distance
'chebyshev' : Chebyshev distance
'standardized_euclidean' : Standardized Euclidean distance
'cosine' : Cosine distance
Defaults to 'euclidean'.
- minkowski_powerfloat, optional
The power parameter for the Minkowski distance metric. This is only used if distance_level is set to 'minkowski'.
Defaults to 3.0.
- embedded_minkowski_powerfloat, optional
The power parameter for the Minkowski distance metric. This is only used if embedded_distance_level is set to 'minkowski'.
Defaults to 3.0.
- distance_method{'brute_force', 'matrix_enabled'}, optional
The method for calculating the distances in original high dimensional space when calculating trustworthness. The following methods are available:
'brute_force' : Use formula to calculate distances
'matrix_enabled' : Matrix-enabled calculation
Defaults to knn_method.
- embedded_knn_method{'brute_force', 'matrix_enabled', 'kd_tree'}, optional
The method used to compute the k-nearest neighbors of the embedded data when calculating trustworthiness. The following methods are available:
'brute_force' : Brute Force searching
'matrix_enabled' : Matrix-enabled searching
'kd_tree' : KD-Tree searching
Defaults to 'brute_force'.
- max_neighbors_trustworthinessint, optional
The maximum number of neighbors to consider when calculating trustworthiness.
Defaults to min(15, int(2(N+1)/3-1e-8)), N is the number of data points.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 1.0.
- Returns
- DataFrame
Trustworthiness of the embedding.
Examples
>>> from hana_ml.algorithms.pal.preprocessing import UMAP, trustworthiness >>> umap = UMAP(n_neighbors=5, n_components=2, knn_method='brute_force', init='random', min_dist=0.1, distance_method='brute_force', embedded_knn_method='brute_force', seed=12345) >>> embedding = umap.fit_transform(data=df, key='ID', features=['X1', 'X2', 'X3']) >>> res = trustworthiness(data=df, embedding=embedding, distance_level='euclidean', distance_method='brute_force', embedded_knn_method='brute_force', max_neighbors_trustworthiness=5) >>> res.collect() NEIGHBORS TRUSTWORTHINESS 0 1 1.000000 1 2 0.952381 2 3 1.000000 3 4 0.962963 4 5 0.877778