Skip to content

Conversation

@arnetheduck
Copy link
Member

@arnetheduck arnetheduck commented May 1, 2025

VertexID has 64 bits of values but currently, but the chain state database uses only a tiny fraction of that (~27 bits). Here, we split up the number space into a fixed portion statically allocated from the MPT path and a dynamic portion for leaves and storage slots.

The static portion simply allocates a (breadth-first) number based on the first nibbles in the address/path while any "deeper" paths instead get a dynamic VertexID like before.

Since the VertedID is path-based, we can more or less guess the VertexID of any node whose path we know based on the "average" depth of the state trie. When we're lucky, a single lookup is sufficient to find the node instead of a one-by-one traversal of each level.

Even in the case that a single lookup is not enough and the actual node is "deeper" than the guess, the starting point helps skip a few levels at least.

Tree depth is estimated by keeping track of hits and misses and occasionally making an adjustment in the direction of the most misses.

On average, this shaves 25% of the import speed for the first 15M blocks where the lookup depth is guessed to be 7 levels - deepening the trie by one more level (when more accounts eventually are added) would see even better performance.

Using 8 levels of statically assigned ids results in 2**32 bits left for dynamic ids / storage slots - this should by far be enough for any foreseeable lifetime of the application, specially because large parts of "current" usage of vertexid space is remains used by actual nodes.

The resulting lookup structure can be thought of as a hybrid between fully path-based lookupts and the current "sparse" id mapping.

blocks: 15721472, baseline: 102h33m7s, contender: 77h4m49s
Time (total): -25h28m18s, -24.84%

requires resync

made with coffee sponsored by @0x-r4bbit :)

@arnetheduck arnetheduck marked this pull request as draft May 1, 2025 11:17
@arnetheduck arnetheduck changed the title Static vid Path-based VertexID May 1, 2025
@arnetheduck arnetheduck force-pushed the static-vid branch 4 times, most recently from c1dc14f to cf176d3 Compare May 8, 2025 10:38
@arnetheduck arnetheduck force-pushed the static-vid branch 2 times, most recently from 2aaafe1 to 49ffbd8 Compare May 15, 2025 05:52
VertexID has 64 bits of values but currently, but the chain state
database uses only a tiny fraction of that (~27 bits). Here, we split up
the number space into a fixed portion statically allocated from the MPT
path and a dynamic portion for leaves and storage slots.

The static portion simply allocates a (bread-first) number based on the
first nibbles in the address/path while any "deeper" paths instead get a
dynamic VertexID like before.

Since the VertedID is path-based, we can more or less guess the VertexID
of any node whose path we know based on the "average" depth of the state
trie. When we're lucky, a single lookup is sufficient to find the node
instead of a one-by-one traversal of each level.

Even in the case that a single lookup is not enough and the actual node
is "deeper" than the guess, the starting point helps skip a few levels
at least.

Tree depth is estimated by keeping track of hits and misses and
occasionally making an adjustment in the direction of the most misses.

On average, this shaves 25% of the import speed for the first 15M blocks
where the lookup depth is guessed to be 7 levels - deepening the trie by
one more level (when more accounts eventually are added) would see even
better performance.

Using 8 levels of statically assigned ids results in 2**32 bits left for
dynamic ids / storage slots - this should by far be enough for any
foreseeable lifetime of the application, specially because large parts
of "current" usage of vertexid space is remains used by actual nodes.

The resulting lookup structure can be thought of as a hybrid between
fully path-based lookupts and the current "sparse" id mapping.

made with coffee sponsored by 0x-r4bbit

fix off-by-one

cleanups
@arnetheduck arnetheduck marked this pull request as ready for review May 22, 2025 12:25
@jakubgs
Copy link
Member

jakubgs commented May 27, 2025

Infra issue to ping us in once this is ready to merge so we can arrange the new DB for nodes:

@advaita-saha advaita-saha merged commit 7864cdc into master May 29, 2025
23 checks passed
@advaita-saha advaita-saha deleted the static-vid branch May 29, 2025 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants