Multi-Dimensional Scaling — hanaml.MDS • hana.ml.r

hanaml.MDS is a R wrapper for SAP HANA PAL Multi-dimensional scaling algorithm.

hanaml.MDS(
  data,
  matrix.type,
  key,
  features = NULL,
  thread.ratio = NULL,
  dim = NULL,
  metric = NULL,
  minkowski.power = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

matrix.type

character

"observation.feature": Observation-feature matrixc.
"dissimilarity": Dissimilarity matrix.

key

character
Name of the ID column.

features

character or list of characters, optional
Specifies the attribute columns to apply scaling to.
Defaults to all non-ID columns.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

dim

integer, optional
The number of dimension that the input dataset is to be reduced to.
Defaults to 2.

metric

chracter, optional

"manhattan": Manhattan distance.
"euclidean": Euclidean distance.
"minkowski": Minkowski distance.

Only valid when matrix.type = "observation.feature".
Defaults to "euclidean".

minkowski.power

double, optional
When you use the Minkowski distance, this parameter controls the value of power.
Only valid if matrix.type = "observation.feature" and metric = "minkowski".
Defaults to 3.

Value

Returns a list of DataFrames:

DataFrame 1
Sampling results, structured as follows:
- DATA_ID: name as shown in input DataFrame.
- DIMENSION: dimension.
- VALUE: value
DataFrame 2
Statistic results, structured as follows:
- STAT_NAME: statistic name.
- STAT_VALUE: statistic value.

Details

This function serves as a tool for dimensional reduction or data visualization. The function embeds the samples in N-dimension in a lower K-dimensional space by applying a non-linear transformation – classical multidimensional scaling. The characteristic of this transformation is that it is able, or does the best it could, to preserve the distances between entities after reducing to a lower dimension.

Examples

Input DataFrame data:


 > data$collect()
   ID        X1        X2        X3        X4
 1  1 0.0000000 0.9047814 0.9085961 0.9103063
 2  2 0.9047814 0.0000000 0.2514457 0.5975016
 3  3 0.9085961 0.2514457 0.0000000 0.4403572
 4  4 0.9103063 0.5975016 0.4403572 0.0000000

Call the function:


> mds <- hanaml.MDS(data,
                    key = "ID",
                    matrix.type = "dissimilarity",
                    thread.ratio = 0.5)

Output:


> mds$labels$Collect()
    ID   DIMENSION   VALUE
  1  1         1  0.65191741
  2  1         2 -0.01585861
  3  2         1 -0.21773716
  4  2         2 -0.25319456
  5  3         1 -0.24990695
  6  3         2 -0.07294968
  7  4         1 -0.18427330
  8  4         2  0.34200285