TSNE

class hana_ml.algorithms.pal.tsne.TSNE(n_iter=None, learning_rate=None, object_frequency=None, n_components=None, angle=None, exaggeration=None, thread_ratio=None, random_state=None, perplexity=None)

Class for T-distributed Stochastic Neighbour Embedding.

Parameters:

thread_ratiofloat, optional

The ratio of available threads.

0 : single thread

0~1 : percentage

Defaults to 0.0.

n_iterint, optional

Specifies the maximum number of iterations for the TSNE algorithm.

Default to 250.

random_stateint, optional

The seed for random number generate.

Default to 0.

exaggerationfloat, optional

Value to be multiplied on \(p_{ij}\) before 250 iterations.

The natural clusters would be more separated with larger value, which means there would be more empty space on the map.

Default to 12.0.

anglefloat, optional

The legal value should be between 0.0 to 1.0.

Setting it to 0.0 means using the "exact" method which would run \(O(N^2)\) time, otherwise TSNE would employ Barnes-Hut approximation which would run \(O(N*log{N})\).

This value is a tradeoff between accuracy and training speed for Barnes-Hut approximation.

The training speed would be faster with higher value.

Default to 0.5.

n_componentsint, optional

Dimension of the embedded space.

Values other than 2 and 3 are illegal.

Default to 2.

object_frequencyint, optional

Frequency of calculating the objective function and putting the result into OBJECTIVES table.

This parameter value should not be larger than the value assigned to n_iter.

Default to 50.

learning_ratefloat, optional

Learning rate.

Default to 200.0.

perplexityfloat, optional

The perplexity is related to the number of nearest neighbors and mentioned above.

Larger value is suitable for large dataset.

Make sure preplexity * 3 < [no. of samples]

Default to 30.0.

Examples

Input dataframe for fit and predict:

>>> df_train.collect()
   ID  ATT1  ATT2  ATT3  ATT4  ATT5
 1   1.0   2.0 -10.0 -20.0   3.0
 2   4.0   5.0 -30.0 -10.0   6.0
 3   7.0   8.0 -40.0 -50.0   9.0
 4  10.0  11.0 -25.0 -15.0  12.0
 5  13.0  14.0 -12.0 -24.0  15.0
 6  16.0  17.0  -9.0 -13.0  18.0

Creating TSNE instance:

>>> tsne = TSNE(n_iter=500, n_components=3, angle=0,
                object_frequency=50, random_state=30)

Performing fit_predict() on given dataframe:

>>> res, stats, obj = tsne.fit_predict(data=df_train, key='ID', perplexity=1.0)

>>> res.collect()
   ID           x           y           z
 1    4.875853 -189.090497 -229.536424
 2  -67.675459  213.661740  178.397623
 3  -68.852910  162.710853  284.966271
 4  -68.056108  193.118052  220.275439
 5   76.524624 -189.850926 -227.625750
 6  123.184000 -190.549221 -226.477160

>>> stats.collect()
   STAT_NAME           STAT_VALUE
0     method                exact
1       iter                  500
2  objective  0.12310845438143747

>>> obj.collect()
   ITER  OBJ_VALUE
  50  50.347530
 100  50.982194
 150  49.368419
 200  70.201283
 250  63.717535
 300   1.296687
 350   0.882636
 400   0.260532
 450   0.174178
 500   0.123108

Attributes:

fit_hdbprocedure: Returns the generated hdbprocedure for fit.
predict_hdbprocedure: Returns the generated hdbprocedure for predict.

Methods

`fit_predict`(data, key[, features])	Alias of fit_transform().
`fit_transform`(data, key[, features])	Fit the TSNE model with input data.

fit_transform(data, key, features=None)

Fit the TSNE model with input data. Model parameters should be given by initializing the model first.

Parameters:

dataDataFrame

Data to be fit.

keystr, optional

Name of the ID column.

featuresListofStrings/str, optional

Name of the features column.

If not specified, the feature columns should be all columns in the input dataframe except the key column.

Returns:

DataFrames

Result table with coordinate value of different dimensions.
Table of statistical values.
Table of objective values of iterations.

fit_predict(data, key, features=None): Alias of fit_transform(). Reserved for backward compatibility.

property fit_hdbprocedure: Returns the generated hdbprocedure for fit.

property predict_hdbprocedure: Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the TSNE class also inherits methods from PALBase class, please refer to PAL Base for more details.