This repository was archived by the owner on Sep 8, 2025. It is now read-only.
-
Couldn't load subscription status.
- Fork 695
This repository was archived by the owner on Sep 8, 2025. It is now read-only.
It should be easier to debug consensus failures #1896
Copy link
Copy link
Open
Description
What is wrong?
It took me a while to diagnose #1893, if debugging could be made easier that would be helpful during emergencies where Trinity has production users.
- In order to reproduce the error I was able to use the gethimport script to re-attempt processing the failing block. This was incredibly useful!
- Documentation on using gethimport, or instructions for using ipython (and how to get an rlp-encoded version of the offending block), would be great things to add to a playbook.
- When the block failed to import I got an error mentioning mix hashes, apparently the pow failed to validate. This is a very misleading error. Luckily, I had experience with this and knew that commenting out the POW check would help me find the root cause.
- Switching the order of checks, or catching the POW exception and raising the more-useful exception if it applies, would make debugging faster and more intuitive.
- After commenting out the POW check I got a better exception, which showed that when Trinity processes the block, it created a block with a different state root, receipts root, and gas cost. In order to drill further down I needed to add some code to
BaseVM.import_blockwhich printed each receipt. I was able to compare that with the output from the gethimport script (there's a command to dump the receipts for a block. I then manually compared gas costs until I found the first receipt which did not match.- This would have been easier if there was some kind of command to dump the receipts trinity generated during block processing. This could be DEBUG output, or maybe a method callable from ipython which disables all checks and returns as much state as it can?
- After figuring out which transaction had been processed incorrectly, I asked Geth to create a trace of it. In order to get a trace from geth I had to run
geth attachand thendebug.standardTraceBlockToFile("[block hash]", {reexec:6000000}).- Finding
standardTraceBlockToFileand thereexeccommand required asking in geth's discord channel. More familiarity with geth's docs, or some kind "consensus failre" playbook would have saved me time there.
- Finding
- Trinity has no method of creating a trace, it only has DEBUG2 output. This made comparing traces extremely difficult. The Trinity trace was 49236 lines, and the geth trace was 23869 lines. Geth reports gas costs in hexadecimal, while Trinity reports them in decimal.
- If Trinity used the standard trace format I could have ran diff and looked at the first place they diverged.
- Once the offending instruction gas cost had been identified finding the actual bug was pretty easy!
veox and voith
Metadata
Metadata
Assignees
Labels
No labels