State corruption after unexpected terminations during snap sync

This issue was discovered by [greg](https://github.com/greg7mdp)(kudos for the debugging assistance!). 

Specifically, the Geth node panic'd during the snap sync, and a few missing nodes were detected during the snapshot generation. The missing nodes are all in the storage tries, particularly the topmost trie nodes in one or two trie paths.

Incomplete storage tries are quite common due to storage chunkification. The strange part is that the shortNode containing the associated account data exists in the account trie, which prevents the state from healing the missing storage trie nodes.

After debugging it for a while, I realized that it's caused by redoing the state sync after the unexpected termination.

Specifically, in sync cycle A, the storage trie of account X was fully synchronized and properly persisted on the disk. The associated account data was also inserted into the account trie and flushed to the disk, indicating that the storage trie was complete and no healing was required. However, a panic occurred, causing the process to terminate without saving the state snap progress indicator.

In sync cycle B, after relaunching, the storage retrieval of account X was redone using the old sync progress indicator. In this new cycle, the storage was chunkified into several pieces, and several trie nodes on the boundary path were deleted from the disk. Since the storage trie in this new cycle was incomplete, account X was tagged as "needHeal," and the account data itself was discarded. Theoretically, this mechanism ensures that a healing operation will be conducted, refilling all missing trie nodes within the account trie and storage trie. However, in cycle A, the account data was already persisted on the disk and not deleted in cycle B. This leftover trie node with account data prevents the state healing, as it assumes the storage trie is complete.


---

![Untitled (Draft)-6](https://github.com/user-attachments/assets/250e8c67-610d-4de7-8c83-e8fe7b84debe)

The leftover node of account X in cycle B breaks the state healing.

---

Originally bug report https://github.com/ethereum/go-ethereum/issues/30149


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

State corruption after unexpected terminations during snap sync #30229

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

State corruption after unexpected terminations during snap sync #30229

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions