TSNE
- class hana_ml.algorithms.pal.tsne.TSNE(n_iter=None, learning_rate=None, object_frequency=None, n_components=None, angle=None, exaggeration=None, thread_ratio=None, random_state=None, perplexity=None)
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique that is particularly well-suited for visualizing high-dimensional datasets by reducing them to lower dimensions (typically 2D or 3D) for effective visualization.
- Parameters:
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.0.
- n_iterint, optional
Specifies the maximum number of iterations for the TSNE algorithm.
Default to 250.
- random_stateint, optional
The seed for random number generate.
Default to 0.
- exaggerationfloat, optional
Value to be multiplied on \(p_{ij}\) before 250 iterations. The natural clusters would be more separated with larger value, which means there would be more empty space on the map.
Default to 12.0.
- anglefloat, optional
The legal value should be between 0.0 to 1.0. Setting it to 0.0 means using the "exact" method which would run \(O(N^2)\) time, otherwise TSNE would employ Barnes-Hut approximation which would run \(O(N*log{N})\). This value is a tradeoff between accuracy and training speed for Barnes-Hut approximation. The training speed would be faster with higher value.
Default to 0.5.
- n_componentsint, optional
Dimension of the embedded space. Values other than 2 and 3 are illegal.
Default to 2.
- object_frequencyint, optional
Frequency of calculating the objective function and putting the result into OBJECTIVES table. This parameter value should not be larger than the value assigned to
n_iter
.Default to 50.
- learning_ratefloat, optional
Learning rate.
Default to 200.0.
- perplexityfloat, optional
The perplexity is related to the number of nearest neighbors and mentioned above. Larger value is suitable for large dataset. Make sure
preplexity
* 3 < [no. of samples]Default to 30.0.
Examples
>>> tsne = TSNE(n_iter=500, n_components=3, angle=0, object_frequency=50, random_state=30)
Performing fit_predict():
>>> res, stats, obj = tsne.fit_predict(data=df_train, key='ID', perplexity=1.0) >>> res.collect() >>> stats.collect() >>> obj.collect()
Methods
fit_predict
(data, key[, features])Alias of fit_transform().
fit_transform
(data, key[, features])Fit the TSNE model with input data.
- fit_transform(data, key, features=None)
Fit the TSNE model with input data. Model parameters should be given by initializing the model first.
- Parameters:
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
- featuresListofStrings/str, optional
Name of the features column.
If not specified, the feature columns should be all columns in the input DataFrame except the key column.
- Returns:
- DataFrames
Result table with coordinate value of different dimensions.
Table of statistical values.
Table of objective values of iterations.
- fit_predict(data, key, features=None)
Alias of fit_transform(). Reserved for backward compatibility.
Inherited Methods from PALBase
Besides those methods mentioned above, the TSNE class also inherits methods from PALBase class, please refer to PAL Base for more details.