Skip to content

t-SNE default parameters and clarifying documentation #18018

@dkobak

Description

@dkobak

Following a conversation with @amueller on Twitter (https://twitter.com/amuellerml/status/1285939094397366272):

  1. Scikit implementation of t-SNE has a factor of 4 in the gradient that most other implementations (the original Barnes Hut implementation, FIt-SNE, openTSNE) do not have. This factor is there in the formula, but the original implementation absorbed it into the learning rate and other implementations followed this convention. This means that your learning_rate corresponds to 4 times higher learning rate in any other implementation. Your 200 is equivalent to 800 of other implementations. This should be mentioned in the docs.

  2. The scale of random Gaussian initialization is std=1e-4. The scale of PCA initialization is whatever the PCA outputs. But t-SNE works better when initialization is small. I think what makes sense is to scale PCA initialization so that it has std=1e-4, as the random init does. I would do that by default for PCA init.
    https://www.nature.com/articles/s41467-019-13056-x
    https://arxiv.org/abs/2007.08902

  3. I would suggest to use init='pca' as default, and not init='random'. PCA init performs much better.
    https://www.nature.com/articles/s41467-019-13056-x
    https://www.nature.com/articles/s41587-020-00809-z

  4. It has been shown that learning rate needs to grow with the sample size, especially for large datasets. A recently suggested heuristic is learning_rate = n/12 (this would be n/48 using your definition of the learning rate):
    https://www.nature.com/articles/s41467-019-13056-x
    https://www.nature.com/articles/s41467-019-13055-y
    I would suggest to implement this as learning_rate='auto' and make it default.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions