pairwise_distances¶
- hana_ml.algorithms.pal.stats.pairwise_distances(data, ref_data, thread_ratio=None, categorical_variable=None, string_variable=None, distance_metric=None, minkow_p=None, category_weights=None, string_weights=None, variable_weight=None)¶
Computes pairwise distances between two datasets.
- Parameters
- dataDataFrame
Input data for computing pairwise distances.
This DataFrame must be structured as follows:
1st column : type INTEGER, VARCHAR, NVARCHAR, BIGINT. ID
Other columns : VARCHAR, NVARCHAR, INTEGER, DOUBLE, REAL_VECTOR, or DECIMAL(p, s).
- ref_dataDataFrame
Reference data for computing pairwise distances.
This DataFrame must be structured as follows:
1st column : type INTEGER, VARCHAR, NVARCHAR, BIGINT. ID
Other columns : VARCHAR, NVARCHAR, INTEGER, DOUBLE, REAL_VECTOR, or DECIMAL(p, s). Same structure as
data.
- thread_ratiofloat, optional
Specifies the ratio of threads used to execute the algorithm.
The value must be between 0 and 1.
Defaults to 1.
- categorical_variablestr or a list of str, optional
Specifies the categorical columns in
dataandref_data. The columns specified incategorical_variablewill be treated as categorical variables when calculating distances.- string_variablestr or a list of str, optional
Specifies the string columns in
dataandref_data. The columns specified instring_variablewill be treated as text variables when calculating distances.- distance_metric{'manhattan', 'euclidean', 'minkowski', 'chebyshev'}, optional
Specifies the distance metric used to compute distances.
Defaults to 'euclidean'.
'manhattan': Manhattan distance.
'euclidean': Euclidean distance.
'minkowski': Minkowski distance.
'chebyshev': Chebyshev distance.
- minkow_pint, optional
The parameter p for Minkowski distance. This parameter is valid only when
distance_metricis set to 'minkowski'.Defaults to 3.
- category_weightsfloat, optional
Weight of category variables. The value must be greater than or equal to 0.
Defaults to 0.707.
- string_weightsfloat, optional
Weight of text variables. The value must be greater than or equal to 0.
Defaults to 1.0.
- variable_weightdict, optional
A python dictionary object that contains the weights of variables. The key is the variable name and the value is the weight of the variable.
If a variable is not specified in
variable_weight, it will be assigned a default weight of 1.0.Example for illustration: {'var1':0.5, 'var2':2.0}.
- Returns
- DataFrame
Pairwise distances. - ID - REF_ID - DISTANCE
- DataFrame
Statistics. - STAT_NAME - STAT_VALUE