Skip to content

Bootsnap compile cache isn't as useful as it could be on CI and production #336

@casperisfine

Description

@casperisfine

Problem 1: cache keys are mtime based

Bootsnap's compile cache is very efficient for development workflows, but on CI or production, it becomes almost unusable.

The main reason is that git (and most other VCS) don't store mtime, so on CI or production, unless your setup manage to preserve mtime, all the compile cache will be invalidated. And most CI / production systems start from a fresh clone of the repository.

The solution to this would be to use file digests instead of mtime, of course hashing a source file is slower than just accessing the mtime, but compared to parsing the Ruby source file, fast hash functions would still offer a major speed up.

Problem 2: the cache isn't self cleaning

The compile cache entries are stored based on the path of the source file. e.g. the cache for path/to.rb will be stored in <cache-dir>/<fnv1a_64(path)>. So if you keep persisting the cache between CI builds or production deploys, over time as you delete some source files, update gems etc, new entries will be created, but outdated ones won't be removed, which might lead to a very bloated cache.

Hence why we have a note in the README about regular flushing on the cache.

And the problem can be even worse with some deploy methods like capistrano, with which the real path of the source files change on every deploy.

So even if we were to fix the mtime issue, we'd need to address cache GC otherwise users would run into big troubles.

Here I'm not too sure what the best solution could be, but I have a few ideas

Solution 2.1: Splitting the cache

Assuming the biggest source of cache garbage is gem upgrades, we could have one compile cache directory per gem, e.g. we could store cache for $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb in $GEM_ROOT/gems/my-gem-1.2.3/.bootsnap/<fnv1a_64(path)>, or even $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb.bootsnap.

This way when you upgrade or remove a gem you automatically get rid of the old cache.

However:

  • This is assuming the gem directory is writable, that's not always the case.
  • It requires to lookup the gem root directory, which might be costly (unless we use the second path format)

I think that if we were to implement this, the vast majority of the GC problem would be solved, as path changes insides the application are much less likely to be frequent enough to produce the problem unless you keep the cache for a very long time.

Solution 2.2: bootsnap precompile --clean

This is much less of a general solution as I don't think is is likely that a large portion of users would integrate bootsnap precompile in their workflow, but in theory we could have it clean the outdated cache entries. Since it will go over all the source files to precompile them, it can make a list of up to date cache entries and delete the rest.

Thoughts

This two changes aren't necessarily that hard to implement, but they are a quite important change, likely justifying a major version bump. So rather than to start writing PRs head on, I'd like to have some feedback on the idea.

@burke I saw you removed yourself from the CODEOWNERS, but if you have a bit of time your insights here would be more than welcome.

@rafaelfranca @DazWorrall I think you may have opinions or hindsights on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions