Info on State Prefetching

Dear Geth Team,

I am currently looking into some optimisations to Geth related to parallelism, state fetching and snapshots as part of my work on the downstream `celo-blockchain`. 

Although it is not a priority, I think there is a chance that some work done could end up being merged upstream. Even if not, I think it is still a good idea to start a conversation.

I have a number of questions:
1. Is it correct to say as of now no actual optimisations exist based on the Berlin activated `AccessList` in terms of `Trie` (i.e. DB warming) or even snapshot object prefetching? What are the team’s plans for such things going forward? More generally, is there some sort of roadmap capturing planned features or are these only captured ephemerally in discussions/issues? 
2. Somewhat relatedly, is it correct to say that the `core/state_prefetcher.go` code is outdated in the sense that one would not like to call `TransitionDB` in advance to warm the state, as it is costly, what’s more, in parallel? What is the (future) purpose of this code since it’s currently not invoked anywhere as far as I can tell? Is the future for this code to morph into DB prefetching based on AccessList? If so, what would be a good entry point to spawning the prefetching? 
3. Why is it that the `TriePrefetcher` that lives in the `StateDB` uses one go worker per state root to walk the Trie? This seems inefficient to me, and in my mind, one should spawn as many goroutines as there are `trie.TryGet`s to be walked. In my mind, one ought to maintain a hashmap of `Tries` and pop tasks off a global task list and spawn as many goroutines as there are tasks. The global prefetcher loop will then respond to stop commands.. This way, one can leverage concurrency for `Trie` prefetching even within a single contract’s state Trie. Since `disk fetch >> goroutine context switch`, I think this is a reasonable change. Since this is a relatively isolated change, I guess I could submit a PR here to familiarise with the contribution process on geth. I guess I’ll make an issue first. One question is whether it is preferable to maintain copies of Tries per thread (since this is rather lightweight - just a pointer to a global? `Database` and the root node). Further, if the intention is to warm the underlying DB, there is not much reason (apart, perhaps, for convenience) to continue to maintain the `Trie`s in memory, is there? In essence, one can request for the `Trie`s anew from the `Database` from a separate location after deleting the corresponding prefetcher `Trie` object in memory, as long as one is maintaining a single global `Database` object.

Cheers,
Jon


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Info on State Prefetching #22724

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Info on State Prefetching #22724

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions