MDS

class hana_ml.algorithms.pal.preprocessing.MDS(matrix_type, thread_ratio=None, dim=None, metric=None, minkowski_power=None)

This class serves as a tool for dimensional reduction or data visualization. There are two kinds of input formats supported by this function: an \(N \times N\) dissimilarity matrix, or a usual entity–feature matrix. The former is a symmetric matrix, with each element representing the distance (dissimilarity) between two entities, while the later can be converted to a dissimilarity matrix using a method specified by the user.

Parameters

matrix_type{'dissimilarity', 'observation_feature'}

The type of the input DataFrame.

thread_ratiofloat, optional

Specifies the ratio of total number of threads that can be used by this function.

The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.

Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Default to 0.

dimint, optional

The number of dimension that the input dataset is to be reduced to.

Default to 2.

metric{'manhattan', 'euclidean', 'minkowski'}, optional

The type of distance during the calculation of dissimilarity matrix.

Only valid when matrix_type is set as 'observation_feature'.

Default to 'euclidean'.

minkowski_powerfloat, optional

When metric is set as 'minkowski', this parameter controls the value of power.

Only valid when matrix_type is set as 'observation_feature' and metric is set as 'minkowski'.

Default to 3.

Examples

Original data:

>>> df.collect()
   ID        X1        X2        X3        X4
0   1  0.000000  0.904781  0.908596  0.910306
1   2  0.904781  0.000000  0.251446  0.597502
2   3  0.908596  0.251446  0.000000  0.440357
3   4  0.910306  0.597502  0.440357  0.000000

Apply the multidimensional scaling:

>>> mds = MDS(matrix_type='dissimilarity', dim=2, thread_ratio=0.5)
>>> res, stats = mds.fit_transform(data=df)
>>> res.collect()
   ID  DIMENSION     VALUE
0   1          1  0.651917
1   1          2 -0.015859
2   2          1 -0.217737
3   2          2 -0.253195
4   3          1 -0.249907
5   3          2 -0.072950
6   4          1 -0.184273
7   4          2  0.342003

>>> stats.collect()
                          STAT_NAME  STAT_VALUE
0                        acheived K    2.000000
1  proportion of variation explaind    0.978901

Attributes

None

Methods

fit_transform(data[, key, features])

Scaling of given datasets in multiple dimensions.

fit_transform(data, key=None, features=None)

Scaling of given datasets in multiple dimensions.

Parameters

dataDataFrame

Dataframe that contains the training data.

keystr, optional

Name of the ID column data.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featuresstr/ListofStrings, optional

Name of the feature columns which needs to be considered in the model.

If not specified, all columns except the key column will be count as feature columns.

Returns

DataFrame

DataFrame 1, scaling result of data, structured as follows:

Data ID : IDs from data

DIMENSION : The dimension number in data

VALUE : Scaled value

DataFrame 2, statistics

property fit_hdbprocedure: Returns the generated hdbprocedure for fit.

property predict_hdbprocedure: Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the MDS class also inherits methods from PALBase class, please refer to PAL Base for more details.