TSNE

class hana_ml.algorithms.pal.tsne.TSNE(n_iter=None, learning_rate=None, object_frequency=None, n_components=None, angle=None, exaggeration=None, thread_ratio=None, random_state=None, perplexity=None)

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique that is particularly well-suited for visualizing high-dimensional datasets by reducing them to lower dimensions (typically 2D or 3D) for effective visualization.

Parameters:
thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.0.

n_iterint, optional

Specifies the maximum number of iterations for the TSNE algorithm.

Default to 250.

random_stateint, optional

The seed for random number generate.

Default to 0.

exaggerationfloat, optional

Value to be multiplied on \(p_{ij}\) before 250 iterations. The natural clusters would be more separated with larger value, which means there would be more empty space on the map.

Default to 12.0.

anglefloat, optional

The legal value should be between 0.0 to 1.0. Setting it to 0.0 means using the "exact" method which would run \(O(N^2)\) time, otherwise TSNE would employ Barnes-Hut approximation which would run \(O(N*log{N})\). This value is a tradeoff between accuracy and training speed for Barnes-Hut approximation. The training speed would be faster with higher value.

Default to 0.5.

n_componentsint, optional

Dimension of the embedded space. Values other than 2 and 3 are illegal.

Default to 2.

object_frequencyint, optional

Frequency of calculating the objective function and putting the result into OBJECTIVES table. This parameter value should not be larger than the value assigned to n_iter.

Default to 50.

learning_ratefloat, optional

Learning rate.

Default to 200.0.

perplexityfloat, optional

The perplexity is related to the number of nearest neighbors and mentioned above. Larger value is suitable for large dataset. Make sure preplexity * 3 < [no. of samples]

Default to 30.0.

Examples

>>> tsne = TSNE(n_iter=500, n_components=3, angle=0, object_frequency=50, random_state=30)

Performing fit_predict():

>>> res, stats, obj = tsne.fit_predict(data=df_train, key='ID', perplexity=1.0)
>>> res.collect()
>>> stats.collect()
>>> obj.collect()

Methods

fit_predict(data, key[, features])

Alias of fit_transform().

fit_transform(data, key[, features])

Fit the TSNE model with input data.

fit_transform(data, key, features=None)

Fit the TSNE model with input data. Model parameters should be given by initializing the model first.

Parameters:
dataDataFrame

Data to be fit.

keystr, optional

Name of the ID column.

featuresListofStrings/str, optional

Name of the features column.

If not specified, the feature columns should be all columns in the input DataFrame except the key column.

Returns:
DataFrames
  • Result table with coordinate value of different dimensions.

  • Table of statistical values.

  • Table of objective values of iterations.

fit_predict(data, key, features=None)

Alias of fit_transform(). Reserved for backward compatibility.

Inherited Methods from PALBase

Besides those methods mentioned above, the TSNE class also inherits methods from PALBase class, please refer to PAL Base for more details.