-
Notifications
You must be signed in to change notification settings - Fork 932
Fix flaky reconstruction test #8321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mergify
merged 3 commits into
sigp:unstable
from
jimmygchen:fix-flaky-reconstruction-test
Nov 10, 2025
Merged
Fix flaky reconstruction test #8321
mergify
merged 3 commits into
sigp:unstable
from
jimmygchen:fix-flaky-reconstruction-test
Nov 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… complete. Previously it wasn't consistently failing because processing the remaining 64 columns *may* give us enough time to finish reconstruction, but this is still flaky.
michaelsproul
approved these changes
Nov 10, 2025
Member
michaelsproul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I want this change after hitting the failure on one of my PRs.
GIMME!
6 tasks
kevaundray
added a commit
to kevaundray/lighthouse
that referenced
this pull request
Nov 11, 2025
* Optimise `state_root_at_slot` for finalized slot (sigp#8353) This is an optimisation targeted at Fulu networks in non-finality. While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state: https://github.com/sigp/lighthouse/blob/2c9b670f5d313450252c6cb40a5ee34802d54fef/beacon_node/http_api/src/lib.rs#L3860-L3861 This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast). Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast. Co-Authored-By: Michael Sproul <[email protected]> * Remove Windows CI jobs (sigp#8362) Remove all Windows-related CI jobs Co-Authored-By: antondlr <[email protected]> * Update proposer-only section in the documentation (sigp#8358) Co-Authored-By: Tan Chee Keong <[email protected]> Co-Authored-By: Michael Sproul <[email protected]> * Fix unaggregated delay metric (sigp#8366) while working on this sigp#7892 @michaelsproul pointed it might be a good metric to measure the delay from start of the slot instead of the current `slot_duration / 3`, since the attestations duties start before the `1/3rd` mark now with the change in the link PR. Co-Authored-By: hopinheimer <[email protected]> Co-Authored-By: hopinheimer <[email protected]> * Downgrade and remove unnecessary logs (sigp#8367) ### Downgrade a non error to `Debug` I noticed this error on one of our hoodi nodes: ``` Nov 04 05:13:38.892 ERROR Error during data column reconstruction block_root: 0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190, error: DuplicateFullyImported(0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190) ``` This shouldn't be logged as an error and it's due to a normal race condition, and it doesn't impact the node negatively. ### Remove spammy logs This logs is filling up the log files quite quickly and it is also something we'd expect during normal operation - getting columns via EL before gossip. We haven't found this debug log to be useful, so I propose we remove it to avoid spamming debug logs. ``` Received already available column sidecar. Ignoring the column sidecar ``` In the process of removing this, I noticed we aren't propagating the validation result, which I think we should so I've added this. The impact should be quite minimal - the message will stay in the gossip memcache for a bit longer but should be evicted in the next heartbeat. Co-Authored-By: Jimmy Chen <[email protected]> * Prepare `sensitive_url` for `crates.io` (sigp#8223) Another good candidate for publishing separately from Lighthouse is `sensitive_url` as it's a general utility crate and not related to Ethereum. This PR prepares it to be spun out into its own crate. I've made the `full` field on `SensitiveUrl` private and instead provided an explicit getter called `.expose_full()`. It's a bit ugly for the diff but I prefer the explicit nature of the getter. I've also added some extra tests and doc strings along with feature gating `Serialize` and `Deserialize` implementations behind the `serde` feature. Co-Authored-By: Mac L <[email protected]> * Remove ecdsa feature of libp2p (sigp#8374) This compiles, is there any reason to keep `ecdsa`? CC @jxs Co-Authored-By: Michael Sproul <[email protected]> * CI workflows to use warpbuild ci runner (sigp#8343) Self hosted GitHub Runners review and improvements local testnet workflow now uses warpbuild ci runner Co-Authored-By: lemon <[email protected]> Co-Authored-By: antondlr <[email protected]> * Remove `sensitive_url` and import from `crates.io` (sigp#8377) Use the recently published `sensitive_url` and remove it from Lighthouse Co-Authored-By: Mac L <[email protected]> * Migrate derivative to educe (sigp#8125) Fixes sigp#7001. Mostly mechanical replacement of `derivative` attributes with `educe` ones. ### **Attribute Syntax Changes** ```rust // Bounds: = "..." → (...) #[derivative(Hash(bound = "E: EthSpec"))] #[educe(Hash(bound(E: EthSpec)))] // Ignore: = "ignore" → (ignore) #[derivative(PartialEq = "ignore")] #[educe(PartialEq(ignore))] // Default values: value = "..." → expression = ... #[derivative(Default(value = "ForkName::Base"))] #[educe(Default(expression = ForkName::Base))] // Methods: format_with/compare_with = "..." → method(...) #[derivative(Debug(format_with = "fmt_peer_set_as_len"))] #[educe(Debug(method(fmt_peer_set_as_len)))] // Empty bounds: removed entirely, educe can infer appropriate bounds #[derivative(Default(bound = ""))] #[educe(Default)] // Transparent debug: manual implementation (educe doesn't support it) #[derivative(Debug = "transparent")] // Replaced with manual Debug impl that delegates to inner field ``` **Note**: Some bounds use strings (`bound("E: EthSpec")`) for superstruct compatibility (`expected ','` errors). Co-Authored-By: Javier Chávarri <[email protected]> Co-Authored-By: Mac L <[email protected]> * Fix flaky reconstruction test (sigp#8321) FIx flaky tests that depends on timing. Previously the test processes all 128 columns and expect reconstruction to happen after all columns are processed. There is a race here, and reconstruction could be triggered before all columns are processed. I've updated the tests to process 64 columns, just enough for reconstruction and wait for 50ms for reconstruction to be triggered. This PR requires the change made in sigp#8194 for the test to pass consistently (blob count set to 1 for all blocks instead of random blob count between 0..max) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]> * Remove `ethers-core` from `execution_layer` (sigp#8149) sigp#6022 Use `alloy_rpc_types::Transaction` to replace the `ethers_core::Transaction` inside the execution block generator. Co-Authored-By: Mac L <[email protected]> * Include block root in publish block logs (sigp#8111) Debugging sigp#8104 it would have been helpful to quickly see in the logs that a specific block was submitted into the HTTP API. Because we want to optimize the block root computation we don't include it in the logs, and just log the block slot. I believe we can take a minute performance hit to have the block root in all the logs during block publishing. Co-Authored-By: dapplion <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]> * fix: clarify `bb` vs `bl` variable names in BeaconProcessorQueue (sigp#8315) since block and blob both start with `bl`, it was not clear how to differentiate between `blbroots_queue` and `bbroots_queue` After renaming, there also seems to be a discrepancy Co-Authored-By: Kevaundray Wedderburn <[email protected]> * Migrate the `deposit_contract` crate to `alloy` (sigp#8139) sigp#6022 Switches the `deposit_contract` crate to use the `alloy` ecosystem and removes the dependency on `ethabi` Co-Authored-By: Mac L <[email protected]> --------- Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: antondlr <[email protected]> Co-authored-by: chonghe <[email protected]> Co-authored-by: hopinheimer <[email protected]> Co-authored-by: Jimmy Chen <[email protected]> Co-authored-by: Jimmy Chen <[email protected]> Co-authored-by: Mac L <[email protected]> Co-authored-by: lmnzx <[email protected]> Co-authored-by: Javier Chávarri <[email protected]> Co-authored-by: Lion - dapplion <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue Addressed
FIx flaky tests that depends on timing. Previously the test processes all 128 columns and expect reconstruction to happen after all columns are processed. There is a race here, and reconstruction could be triggered before all columns are processed.
I've updated the tests to process 64 columns, just enough for reconstruction and wait for 50ms for reconstruction to be triggered.
This PR requires the change made in #8194 for the test to pass consistently (blob count set to 1 for all blocks instead of random blob count between 0..max)