Skip to content

Idea: don't re-analyze files depending on their SHA #169

@vemv

Description

@vemv

Problem statement

Any given linter (particularly Eastwood) can be slow.

For the use case of "running branch-formatter integrated with the repl", one can observe that one might be linting again files that haven't been modified.

For example:

  • I'm working in a long-lived branch that has touched 30 files
    • accordingly, the branch-formatter strategy will dictate that 30 files must be analyzed per lint! invocation
  • But I'm only actively working in a smaller subset, like 1 or 2 files
    • The other files touched by the branch may have been already analyzed, which makes re-analyzing them redundant.

Proposal

Implement a strategy that removes files if all these conditions are met:

  • the given file has already been successfully analyzed
    • e.g. has warnings or empty warnings, but no exception was thrown
  • the SHA of the file's contents hasn't changed
    • (File#lastModified can also work)
  • no linter warnings have been emitted for this particular file
    • this is important - otherwise one would show linter warnings just once, and then elide them in subsequent runs because the file hadn't changed.

Caveats

It's plausible that fixing a given file's linting result does not depend entirely on the file itself: fixing it might be accomplished by modifying a different file.

That could cause stale caches or such.

However I'm not immediately aware if such a case. Still worth a think, on a per-linter basis.

Applicability

This optimization is only possibly useful in a local dev repl, so out of caution, I would not add the proposed strategy to the stack if (System/getenv "CI").

Other

I think that the cache should be keyed per git-branch and possibly deleted on each detected git branch change anyway. This is a basic caution against the mentioned Caveats.

Alternatives and comparison

Cache linters' reports. e.g. (caching-linter/new (linters.kondo/new {}))

Might perform better.

cc/ @thumbnail

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions