MDS
- class hana_ml.algorithms.pal.preprocessing.MDS(matrix_type, thread_ratio=None, dim=None, metric=None, minkowski_power=None)
This class serves as a tool for dimensional reduction or data visualization. There are two kinds of input formats supported by this function: an \(N \times N\) dissimilarity matrix, or a usual entity–feature matrix. The former is a symmetric matrix, with each element representing the distance (dissimilarity) between two entities, while the later can be converted to a dissimilarity matrix using a method specified by the user.
- Parameters:
- matrix_type{'dissimilarity', 'observation_feature'}
The type of the input DataFrame.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.
- dimint, optional
The number of dimension that the input dataset is to be reduced to.
Default to 2.
- metric{'manhattan', 'euclidean', 'minkowski'}, optional
The type of distance during the calculation of dissimilarity matrix.
Only valid when
matrix_type
is set as 'observation_feature'.Default to 'euclidean'.
- minkowski_powerfloat, optional
When
metric
is set as 'minkowski', this parameter controls the value of power.Only valid when
matrix_type
is set as 'observation_feature' andmetric
is set as 'minkowski'.Default to 3.
Examples
>>> mds = MDS(matrix_type='dissimilarity', dim=2, thread_ratio=0.5) >>> res, stats = mds.fit_transform(data=df) >>> res.collect() >>> stats.collect()
- Attributes:
- None
Methods
fit_transform
(data[, key, features])Scaling of given datasets in multiple dimensions.
- fit_transform(data, key=None, features=None)
Scaling of given datasets in multiple dimensions.
- Parameters:
- dataDataFrame
Dataframe that contains the training data.
- keystr, optional
Name of the ID column
data
.Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresstr/ListofStrings, optional
Name of the feature columns which needs to be considered in the model.
If not specified, all columns except the key column will be count as feature columns.
- Returns:
- DataFrame
DataFrame 1, scaling result of data, structured as follows:
Data ID : IDs from data
DIMENSION : The dimension number in data
VALUE : Scaled value
DataFrame 2, statistics
Inherited Methods from PALBase
Besides those methods mentioned above, the MDS class also inherits methods from PALBase class, please refer to PAL Base for more details.