TSNE
- class hana_ml.algorithms.pal.tsne.TSNE(n_iter=None, learning_rate=None, object_frequency=None, n_components=None, angle=None, exaggeration=None, thread_ratio=None, random_state=None, perplexity=None)
Class for T-distributed Stochastic Neighbour Embedding.
- Parameters
- thread_ratiofloat, optional
- The ratio of available threads.
0 : single thread
0~1 : percentage
Defaults to 0.0.
- n_iterint, optional
Specifies the maximum number of iterations for the TSNE algorithm.
Default to 250.
- random_stateint, optional
The seed for random number generate.
Default to 0.
- exaggerationfloat, optional
Value to be multiplied on \(p_{ij}\) before 250 iterations.
The natural clusters would be more separated with larger value, which means there would be more empty space on the map.
Default to 12.0.
- anglefloat, optional
The legal value should be between 0.0 to 1.0.
Setting it to 0.0 means using the "exact" method which would run \(O(N^2)\) time, otherwise TSNE would employ Barnes-Hut approximation which would run \(O(N*log{N})\).
This value is a tradeoff between accuracy and training speed for Barnes-Hut approximation.
The training speed would be faster with higher value.
Default to 0.5.
- n_componentsint, optional
Dimension of the embedded space.
Values other than 2 and 3 are illegal.
Default to 2.
- object_frequencyint, optional
Frequency of calculating the objective function and putting the result into OBJECTIVES table.
This parameter value should not be larger than the value assigned to
n_iter
.Default to 50.
- learning_ratefloat, optional
Learning rate.
Default to 200.0.
- perplexityfloat, optional
The perplexity is related to the number of nearest neighbors and mentioned above.
Larger value is suitable for large dataset.
Make sure
preplexity
* 3 < [no. of samples]Default to 30.0.
Examples
Input dataframe for fit and predict:
>>> df_train.collect() ID ATT1 ATT2 ATT3 ATT4 ATT5 0 1 1.0 2.0 -10.0 -20.0 3.0 1 2 4.0 5.0 -30.0 -10.0 6.0 2 3 7.0 8.0 -40.0 -50.0 9.0 3 4 10.0 11.0 -25.0 -15.0 12.0 4 5 13.0 14.0 -12.0 -24.0 15.0 5 6 16.0 17.0 -9.0 -13.0 18.0
Creating TSNE instance:
>>> tsne = TSNE(self.conn, n_iter=500, n_components=3, angle=0, object_frequency=50, random_state=30)
Performing fit_predict() on given dataframe:
>>> res, stats, obj = tsne.fit_predict(data=self.df_train, key='ID', perplexity=1.0)
>>> res.collect() ID x y z 0 1 4.875853 -189.090497 -229.536424 1 2 -67.675459 213.661740 178.397623 2 3 -68.852910 162.710853 284.966271 3 4 -68.056108 193.118052 220.275439 4 5 76.524624 -189.850926 -227.625750 5 6 123.184000 -190.549221 -226.477160
>>> stats.collect() STAT_NAME STAT_VALUE 0 method exact 1 iter 500 2 objective 0.12310845438143747
>>> obj.collect() ITER OBJ_VALUE 0 50 50.347530 1 100 50.982194 2 150 49.368419 3 200 70.201283 4 250 63.717535 5 300 1.296687 6 350 0.882636 7 400 0.260532 8 450 0.174178 9 500 0.123108
- Attributes
fit_hdbprocedure
Returns the generated hdbprocedure for fit.
predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Methods
fit_predict
(data, key[, features])Alias of fit_transform().
fit_transform
(data, key[, features])Fit the TSNE model with input data.
- fit_transform(data, key, features=None)
Fit the TSNE model with input data. Model parameters should be given by initializing the model first.
- Parameters
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
- featuresListofStrings/str, optional
Name of the features column.
If not specified, the feature columns should be all columns in the input dataframe except the key column.
- Returns
- DataFrames
Result table with coordinate value of different dimensions.
Table of statistical values.
Table of objective values of iterations.
- fit_predict(data, key, features=None)
Alias of fit_transform(). Reserved for backward compatibility.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.