Skip to content

Conversation

@dapplion
Copy link
Collaborator

Issue Addressed

Looking at holesky nodes for Lookup maybe stuck logs, found the following state:

May 19 09:00:04.308 DEBG Lookup maybe stuck                      summary: SingleBlockLookup { id: 6956, block_request_state: BlockRequestState { state: SingleLookupRequestState { state: AwaitingDownload, available_peers: 0, failed_processing: 0, failed_downloading: 0 } }, blob_request_state: BlobRequestState { state: SingleLookupRequestState { state: AwaitingDownload, available_peers: 1, failed_processing: 0, failed_downloading: 0 } }, block_root: 0x77921f4f47635e2a95efb4753625523aaba24a4636b7dae8544b45e894c04661, awaiting_parent: None, created: Instant { tv_sec: 20313759, tv_nsec: 73955749 } }, block_root: 0x77921f4f47635e2a95efb4753625523aaba24a4636b7dae8544b45e894c04661, id: 6956, service: lookup_sync, service: sync

This state is problematic because the block request has 0 peers and the blob request has 1 peer. The peer sets of all request should be identical, and the cause if this return early here

pub fn remove_peer(&mut self, peer_id: &PeerId) -> bool {
self.block_request_state.state.remove_peer(peer_id)
&& self.blob_request_state.state.remove_peer(peer_id)
}

The tests did not caught the bug, because the covered test case always returned RPCError for all active requests. In the case a lookup only sends a block request (not a blob request) and the peer disconnects, the lookup may get stuck.

Proposed Changes

Duplicating the list of peers (= peers that claim to have imported the set of block components) between block and blob requests in not necessary. This PR hoists the peer set out of the request state into the lookup struct; indirectly fixing the return early bug.

Also add test to cover the case a lookup loses all peers but does not receive a RPCError.

Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice this makes a lot of sense considering the other recent changes

@realbigsean
Copy link
Member

@mergify queue

@mergify
Copy link

mergify bot commented May 20, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 2a87016

@mergify mergify bot merged commit 2a87016 into sigp:unstable May 20, 2024
@dapplion dapplion deleted the fix-lookup-disconnect-peer branch May 22, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants