ibrl: remove packet batch sender thread from streamer #8786

cavemanloverboy · 2025-10-30T20:52:54Z

Problem

#8356 removed the coalescing in the tpu, which removes the need for a batching thread. now, any batching is done by sigverify.

Summary of Changes

removes packet batch sender, sending from streamer to sigverify directly.

this also removes 3 threads from the validator (quic tpu, quic tpu fwd, quic tpu vote).

This is a draft pr. There is a single allocation done per packet which i am not a fan of. We could get around this by introducing a PacketBatch::Single(BytesPacket) but i don't wanna make too many changes without getting out this rough draft and collecting feedback.

codecov-commenter · 2025-10-30T22:32:31Z

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (cbc2ffb) to head (353da1b).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #8786    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         863      863            
  Lines      374306   374157   -149     
========================================
- Hits       311456   311351   -105     
+ Misses      62850    62806    -44

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lijunwangs · 2025-10-31T02:45:01Z

With the change

Received txns count: total: 1496103, rate 164939/s
Received txns count: total: 2328107, rate 166400/s
Received txns count: total: 3168597, rate 168098/s
Received txns count: total: 4007734, rate 167827/s
Received txns count: total: 4832852, rate 165023/s

Without (against cbc2ffb):

Received txns count: total: 1055977, rate 160998/s
Received txns count: total: 1864330, rate 161670/s
Received txns count: total: 2669271, rate 160988/s
Received txns count: total: 3472916, rate 160729/s
Received txns count: total: 4278771, rate 161171/s

It seems to have better throughput.

cavemanloverboy · 2025-10-31T14:37:18Z

ok lets try to remove this 1 allo per packet and do better. how do you wanna do it? PacketBatch::Single? I can also change the channel to send a BytesPacket instead. Then, one allocation per batch can occur in sigverify's streamer recv. I kind of like the latter more, we can pre allo for ≈128 per batch or something.

t-nelson · 2025-10-31T16:17:52Z

probably new PacketBatch variant first, then remove PacketBatch in a followup. imagine the latter will dirty the diff a lot

cavemanloverboy · 2025-10-31T17:30:13Z

@lijunwangs can you rerun bench?

alexpyattaev · 2025-10-31T18:36:38Z

I can also change the channel to send a BytesPacket instead.

This sounds good. Less layers of abstraction is always better.

cavemanloverboy · 2025-10-31T18:59:03Z

should we delete the gpu code and PacketBatch in a followup? hehe.

then, sigverify just uses a Vec<BytesPacket>

t-nelson · 2025-10-31T21:14:24Z

should we delete the gpu code and PacketBatch in a followup?

two followups. yes

diman-io · 2025-11-02T03:44:40Z

Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary? I don’t know how par_iter works, but wouldn’t it be better to have several threads concurrently consuming one packet at a time from a queue?

UPD.

are they still actually necessary?

I mean a vector

alexpyattaev · 2025-11-02T06:51:15Z

Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary?

Not necessary. Technically pulling packets off the channel one at a time has its oen overheads, so batching does help with that, but if we can almost never fill a batch it is useless in that role.

I don’t know how par_iter works

It splits a vector into a bunch of slices, (one per thread in the pool) and gives each slice to a thread. Once a thread is done with its own slice, it will try to "steal" work from other threads that are still busy.

steviez · 2025-11-03T18:47:27Z

Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary?

Not necessary. Technically pulling packets off the channel one at a time has its oen overheads, so batching does help with that, but if we can almost never fill a batch it is useless in that role.

Yes, the PacketBatch type was necessary for the GPU code instead of just a Vec<Packet>. PacketBatch used to just be a PinnedVec<Packet> where we see the definition of PinnedVec here:

agave/perf/src/cuda_runtime.rs

Lines 54 to 65 in 4dc5b0e

    
           // A vector wrapper where the underlying memory can be 
        
           // page-pinned. Controlled by flags in case user only wants 
        
           // to pin in certain circumstances. 
        
           #[cfg_attr(feature = "frozen-abi", derive(AbiExample))] 
        
           #[derive(Debug, Default, Serialize, Deserialize)] 
        
           pub struct PinnedVec<T: Default + Clone + Sized> { 
        
               x: Vec<T>, 
        
               pinned: bool, 
        
               pinnable: bool, 
        
               #[serde(skip)] 
        
               recycler: Weak<RecyclerX<PinnedVec<T>>>, 
        
           }

PacketBatch has since become an enum; I don't recall if that other type is GPU friendly or not. But as mentioned, GPU has been on our potential kill list for a while (#3817 is a previous attempt that stalled out)

t-nelson · 2025-11-03T18:58:57Z

PacketBatch has since become an enum; I don't recall if that other type is GPU friendly or not. But as mentioned, GPU has been on our potential kill list for a while (#3817 is a previous attempt that stalled out)

shh! quiet part out loud

ibrl: remove packet batch sender thread from streamer

353da1b

mergify bot added community need:merge-assist labels Oct 30, 2025

mergify bot requested a review from a team October 30, 2025 20:53

alexpyattaev requested a review from lijunwangs October 30, 2025 21:19

alexpyattaev added the CI Pull Request is ready to enter CI label Oct 30, 2025

anza-team removed the CI Pull Request is ready to enter CI label Oct 30, 2025

alexpyattaev self-requested a review October 31, 2025 06:36

introduce PacketBatch::Single

7458682

cavemanloverboy marked this pull request as ready for review November 1, 2025 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ibrl: remove packet batch sender thread from streamer #8786

ibrl: remove packet batch sender thread from streamer #8786

cavemanloverboy commented Oct 30, 2025

Uh oh!

codecov-commenter commented Oct 30, 2025

Uh oh!

lijunwangs commented Oct 31, 2025

Uh oh!

cavemanloverboy commented Oct 31, 2025 •

edited

Loading

Uh oh!

t-nelson commented Oct 31, 2025

Uh oh!

cavemanloverboy commented Oct 31, 2025

Uh oh!

alexpyattaev commented Oct 31, 2025

Uh oh!

cavemanloverboy commented Oct 31, 2025 •

edited

Loading

Uh oh!

t-nelson commented Oct 31, 2025

Uh oh!

diman-io commented Nov 2, 2025 •

edited

Loading

Uh oh!

alexpyattaev commented Nov 2, 2025

Uh oh!

steviez commented Nov 3, 2025

Uh oh!

t-nelson commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ibrl: remove packet batch sender thread from streamer #8786

Are you sure you want to change the base?

ibrl: remove packet batch sender thread from streamer #8786

Conversation

cavemanloverboy commented Oct 30, 2025

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Oct 30, 2025

Codecov Report

Uh oh!

lijunwangs commented Oct 31, 2025

Uh oh!

cavemanloverboy commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-nelson commented Oct 31, 2025

Uh oh!

cavemanloverboy commented Oct 31, 2025

Uh oh!

alexpyattaev commented Oct 31, 2025

Uh oh!

cavemanloverboy commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-nelson commented Oct 31, 2025

Uh oh!

diman-io commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexpyattaev commented Nov 2, 2025

Uh oh!

steviez commented Nov 3, 2025

Uh oh!

t-nelson commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

cavemanloverboy commented Oct 31, 2025 •

edited

Loading

cavemanloverboy commented Oct 31, 2025 •

edited

Loading

diman-io commented Nov 2, 2025 •

edited

Loading