t-SNE default parameters and clarifying documentation

Following a conversation with @amueller on Twitter (https://twitter.com/amuellerml/status/1285939094397366272):

1) Scikit implementation of t-SNE has a factor of 4 in the gradient that most other implementations (the original Barnes Hut implementation, FIt-SNE, openTSNE) do not have. This factor is there in the formula, but the original implementation absorbed it into the learning rate and other implementations followed this convention. This means that your `learning_rate` corresponds to 4 times higher learning rate in any other implementation. Your 200 is equivalent to 800 of other implementations. This should be mentioned in the docs.

2) The scale of random Gaussian initialization is std=1e-4. The scale of PCA initialization is whatever the PCA outputs. But t-SNE works better when initialization is small. I think what makes sense is to scale PCA initialization so that it has std=1e-4, as the random init does. I would do that by default for PCA init. 
https://www.nature.com/articles/s41467-019-13056-x  
https://arxiv.org/abs/2007.08902  

3) I would suggest to use `init='pca'` as default, and not `init='random'`. PCA init performs much better.  
https://www.nature.com/articles/s41467-019-13056-x  
https://www.nature.com/articles/s41587-020-00809-z

4) It has been shown that learning rate needs to grow with the sample size, especially for large datasets. A recently suggested heuristic is learning_rate = n/12 (this would be n/48 using your definition of the learning rate):   
https://www.nature.com/articles/s41467-019-13056-x  
https://www.nature.com/articles/s41467-019-13055-y  
I would suggest to implement this as `learning_rate='auto'` and make it default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

t-SNE default parameters and clarifying documentation #18018

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

t-SNE default parameters and clarifying documentation #18018

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions