Skip to content

Info on State Prefetching #22724

@jon-chuang

Description

@jon-chuang

Dear Geth Team,

I am currently looking into some optimisations to Geth related to parallelism, state fetching and snapshots as part of my work on the downstream celo-blockchain.

Although it is not a priority, I think there is a chance that some work done could end up being merged upstream. Even if not, I think it is still a good idea to start a conversation.

I have a number of questions:

  1. Is it correct to say as of now no actual optimisations exist based on the Berlin activated AccessList in terms of Trie (i.e. DB warming) or even snapshot object prefetching? What are the team’s plans for such things going forward? More generally, is there some sort of roadmap capturing planned features or are these only captured ephemerally in discussions/issues?
  2. Somewhat relatedly, is it correct to say that the core/state_prefetcher.go code is outdated in the sense that one would not like to call TransitionDB in advance to warm the state, as it is costly, what’s more, in parallel? What is the (future) purpose of this code since it’s currently not invoked anywhere as far as I can tell? Is the future for this code to morph into DB prefetching based on AccessList? If so, what would be a good entry point to spawning the prefetching?
  3. Why is it that the TriePrefetcher that lives in the StateDB uses one go worker per state root to walk the Trie? This seems inefficient to me, and in my mind, one should spawn as many goroutines as there are trie.TryGets to be walked. In my mind, one ought to maintain a hashmap of Tries and pop tasks off a global task list and spawn as many goroutines as there are tasks. The global prefetcher loop will then respond to stop commands.. This way, one can leverage concurrency for Trie prefetching even within a single contract’s state Trie. Since disk fetch >> goroutine context switch, I think this is a reasonable change. Since this is a relatively isolated change, I guess I could submit a PR here to familiarise with the contribution process on geth. I guess I’ll make an issue first. One question is whether it is preferable to maintain copies of Tries per thread (since this is rather lightweight - just a pointer to a global? Database and the root node). Further, if the intention is to warm the underlying DB, there is not much reason (apart, perhaps, for convenience) to continue to maintain the Tries in memory, is there? In essence, one can request for the Tries anew from the Database from a separate location after deleting the corresponding prefetcher Trie object in memory, as long as one is maintaining a single global Database object.

Cheers,
Jon

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions