MDS
- class hana_ml.algorithms.pal.preprocessing.MDS(matrix_type, thread_ratio=None, dim=None, metric=None, minkowski_power=None)
This class serves as a tool for dimensional reduction or data visualization. There are two kinds of input formats supported by this function: an \(N \times N\) dissimilarity matrix, or a usual entity–feature matrix. The former is a symmetric matrix, with each element representing the distance (dissimilarity) between two entities, while the later can be converted to a dissimilarity matrix using a method specified by the user.
- Parameters
- matrix_type{'dissimilarity', 'observation_feature'}
The type of the input DataFrame.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.
- dimint, optional
The number of dimension that the input dataset is to be reduced to.
Default to 2.
- metric{'manhattan', 'euclidean', 'minkowski'}, optional
The type of distance during the calculation of dissimilarity matrix.
Only valid when
matrix_type
is set as 'observation_feature'.Default to 'euclidean'.
- minkowski_powerfloat, optional
When
metric
is set as 'minkowski', this parameter controls the value of power.Only valid when
matrix_type
is set as 'observation_feature' andmetric
is set as 'minkowski'.Default to 3.
Examples
Original data:
>>> df.collect() ID X1 X2 X3 X4 0 1 0.000000 0.904781 0.908596 0.910306 1 2 0.904781 0.000000 0.251446 0.597502 2 3 0.908596 0.251446 0.000000 0.440357 3 4 0.910306 0.597502 0.440357 0.000000
Apply the multidimensional scaling:
>>> mds = MDS(matrix_type='dissimilarity', dim=2, thread_ratio=0.5) >>> res, stats = mds.fit_transform(data=df) >>> res.collect() ID DIMENSION VALUE 0 1 1 0.651917 1 1 2 -0.015859 2 2 1 -0.217737 3 2 2 -0.253195 4 3 1 -0.249907 5 3 2 -0.072950 6 4 1 -0.184273 7 4 2 0.342003
>>> stats.collect() STAT_NAME STAT_VALUE 0 acheived K 2.000000 1 proportion of variation explaind 0.978901
- Attributes
- None
Methods
fit_transform
(data[, key, features])Scaling of given datasets in multiple dimensions.
- fit_transform(data, key=None, features=None)
Scaling of given datasets in multiple dimensions.
- Parameters
- dataDataFrame
Dataframe that contains the training data.
- keystr, optional
Name of the ID column
data
.Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresstr/ListofStrings, optional
Name of the feature columns which needs to be considered in the model.
If not specified, all columns except the key column will be count as feature columns.
- Returns
- DataFrame
DataFrame 1, scaling result of data, structured as follows:
Data ID : IDs from data
DIMENSION : The dimension number in data
VALUE : Scaled value
DataFrame 2, statistics
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.