-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Following a conversation with @amueller on Twitter (https://twitter.com/amuellerml/status/1285939094397366272):
-
Scikit implementation of t-SNE has a factor of 4 in the gradient that most other implementations (the original Barnes Hut implementation, FIt-SNE, openTSNE) do not have. This factor is there in the formula, but the original implementation absorbed it into the learning rate and other implementations followed this convention. This means that your
learning_ratecorresponds to 4 times higher learning rate in any other implementation. Your 200 is equivalent to 800 of other implementations. This should be mentioned in the docs. -
The scale of random Gaussian initialization is std=1e-4. The scale of PCA initialization is whatever the PCA outputs. But t-SNE works better when initialization is small. I think what makes sense is to scale PCA initialization so that it has std=1e-4, as the random init does. I would do that by default for PCA init.
https://www.nature.com/articles/s41467-019-13056-x
https://arxiv.org/abs/2007.08902 -
I would suggest to use
init='pca'as default, and notinit='random'. PCA init performs much better.
https://www.nature.com/articles/s41467-019-13056-x
https://www.nature.com/articles/s41587-020-00809-z -
It has been shown that learning rate needs to grow with the sample size, especially for large datasets. A recently suggested heuristic is learning_rate = n/12 (this would be n/48 using your definition of the learning rate):
https://www.nature.com/articles/s41467-019-13056-x
https://www.nature.com/articles/s41467-019-13055-y
I would suggest to implement this aslearning_rate='auto'and make it default.