TSNE
- class hana_ml.algorithms.pal.tsne.TSNE(n_iter=None, learning_rate=None, object_frequency=None, n_components=None, angle=None, exaggeration=None, thread_ratio=None, random_state=None, perplexity=None)
Class for T-distributed Stochastic Neighbour Embedding.
- Parameters
- thread_ratiofloat, optional
The ratio of available threads.
0 : single thread
0~1 : percentage
Defaults to 0.0.
- n_iterint, optional
Specifies the maximum number of iterations for the TSNE algorithm.
Default to 250.
- random_stateint, optional
The seed for random number generate.
Default to 0.
- exaggerationfloat, optional
Value to be multiplied on \(p_{ij}\) before 250 iterations.
The natural clusters would be more separated with larger value, which means there would be more empty space on the map.
Default to 12.0.
- anglefloat, optional
The legal value should be between 0.0 to 1.0.
Setting it to 0.0 means using the "exact" method which would run \(O(N^2)\) time, otherwise TSNE would employ Barnes-Hut approximation which would run \(O(N*log{N})\).
This value is a tradeoff between accuracy and training speed for Barnes-Hut approximation.
The training speed would be faster with higher value.
Default to 0.5.
- n_componentsint, optional
Dimension of the embedded space.
Values other than 2 and 3 are illegal.
Default to 2.
- object_frequencyint, optional
Frequency of calculating the objective function and putting the result into OBJECTIVES table.
This parameter value should not be larger than the value assigned to
n_iter
.Default to 50.
- learning_ratefloat, optional
Learning rate.
Default to 200.0.
- perplexityfloat, optional
The perplexity is related to the number of nearest neighbors and mentioned above.
Larger value is suitable for large dataset.
Make sure
preplexity
* 3 < [no. of samples]Default to 30.0.
Examples
Input dataframe for fit and predict:
>>> df_train.collect() ID ATT1 ATT2 ATT3 ATT4 ATT5 0 1 1.0 2.0 -10.0 -20.0 3.0 1 2 4.0 5.0 -30.0 -10.0 6.0 2 3 7.0 8.0 -40.0 -50.0 9.0 3 4 10.0 11.0 -25.0 -15.0 12.0 4 5 13.0 14.0 -12.0 -24.0 15.0 5 6 16.0 17.0 -9.0 -13.0 18.0
Creating TSNE instance:
>>> tsne = TSNE(n_iter=500, n_components=3, angle=0, object_frequency=50, random_state=30)
Performing fit_predict() on given dataframe:
>>> res, stats, obj = tsne.fit_predict(data=df_train, key='ID', perplexity=1.0)
>>> res.collect() ID x y z 0 1 4.875853 -189.090497 -229.536424 1 2 -67.675459 213.661740 178.397623 2 3 -68.852910 162.710853 284.966271 3 4 -68.056108 193.118052 220.275439 4 5 76.524624 -189.850926 -227.625750 5 6 123.184000 -190.549221 -226.477160
>>> stats.collect() STAT_NAME STAT_VALUE 0 method exact 1 iter 500 2 objective 0.12310845438143747
>>> obj.collect() ITER OBJ_VALUE 0 50 50.347530 1 100 50.982194 2 150 49.368419 3 200 70.201283 4 250 63.717535 5 300 1.296687 6 350 0.882636 7 400 0.260532 8 450 0.174178 9 500 0.123108
- Attributes
fit_hdbprocedure
Returns the generated hdbprocedure for fit.
predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Methods
fit_predict
(data, key[, features])Alias of fit_transform().
fit_transform
(data, key[, features])Fit the TSNE model with input data.
- fit_transform(data, key, features=None)
Fit the TSNE model with input data. Model parameters should be given by initializing the model first.
- Parameters
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
- featuresListofStrings/str, optional
Name of the features column.
If not specified, the feature columns should be all columns in the input dataframe except the key column.
- Returns
- DataFrames
Result table with coordinate value of different dimensions.
Table of statistical values.
Table of objective values of iterations.
- fit_predict(data, key, features=None)
Alias of fit_transform(). Reserved for backward compatibility.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Inherited Methods from PALBase
Besides those methods mentioned above, the TSNE class also inherits methods from PALBase class, please refer to PAL Base for more details.