MDS
- class hana_ml.algorithms.pal.preprocessing.MDS(matrix_type, thread_ratio=None, dim=None, metric=None, minkowski_power=None)
This class serves as a tool for dimensional reduction or data visualization. There are two kinds of input formats supported by this function: an \(N \times N\) dissimilarity matrix, or a usual entity–feature matrix. The former is a symmetric matrix, with each element representing the distance (dissimilarity) between two entities, while the later can be converted to a dissimilarity matrix using a method specified by the user.
- Parameters:
- matrix_type{'dissimilarity', 'observation_feature'}
The type of the input DataFrame.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.
- dimint, optional
The number of dimension that the input dataset is to be reduced to.
Default to 2.
- metric{'manhattan', 'euclidean', 'minkowski'}, optional
The type of distance during the calculation of dissimilarity matrix.
Only valid when
matrix_type
is set as 'observation_feature'.Default to 'euclidean'.
- minkowski_powerfloat, optional
When
metric
is set as 'minkowski', this parameter controls the value of power.Only valid when
matrix_type
is set as 'observation_feature' andmetric
is set as 'minkowski'.Default to 3.
Examples
>>> mds = MDS(matrix_type='dissimilarity', dim=2, thread_ratio=0.5) >>> res, stats = mds.fit_transform(data=df) >>> res.collect() >>> stats.collect()
- Attributes:
- None
Methods
fit_transform
(data[, key, features])Scaling of given datasets in multiple dimensions.
Get the model metrics.
Get the score metrics.
- fit_transform(data, key=None, features=None)
Scaling of given datasets in multiple dimensions.
- Parameters:
- dataDataFrame
Dataframe that contains the training data.
- keystr, optional
Name of the ID column
data
.Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresstr/ListofStrings, optional
Name of the feature columns which needs to be considered in the model.
If not specified, all columns except the key column will be count as feature columns.
- Returns:
- DataFrame
DataFrame 1, scaling result of data, structured as follows:
Data ID : IDs from data
DIMENSION : The dimension number in data
VALUE : Scaled value
DataFrame 2, statistics
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the MDS class also inherits methods from PALBase class, please refer to PAL Base for more details.