Skip to content

Conversation

@cavemanloverboy
Copy link

Problem

#8356 removed the coalescing in the tpu, which removes the need for a batching thread. now, any batching is done by sigverify.

Summary of Changes

removes packet batch sender, sending from streamer to sigverify directly.

this also removes 3 threads from the validator (quic tpu, quic tpu fwd, quic tpu vote).

This is a draft pr. There is a single allocation done per packet which i am not a fan of. We could get around this by introducing a PacketBatch::Single(BytesPacket) but i don't wanna make too many changes without getting out this rough draft and collecting feedback.

@mergify mergify bot requested a review from a team October 30, 2025 20:53
@alexpyattaev alexpyattaev added the CI Pull Request is ready to enter CI label Oct 30, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Oct 30, 2025
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (cbc2ffb) to head (353da1b).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #8786    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         863      863            
  Lines      374306   374157   -149     
========================================
- Hits       311456   311351   -105     
+ Misses      62850    62806    -44     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lijunwangs
Copy link

With the change

Received txns count: total: 1496103, rate 164939/s
Received txns count: total: 2328107, rate 166400/s
Received txns count: total: 3168597, rate 168098/s
Received txns count: total: 4007734, rate 167827/s
Received txns count: total: 4832852, rate 165023/s

Without (against cbc2ffb):

Received txns count: total: 1055977, rate 160998/s
Received txns count: total: 1864330, rate 161670/s
Received txns count: total: 2669271, rate 160988/s
Received txns count: total: 3472916, rate 160729/s
Received txns count: total: 4278771, rate 161171/s

It seems to have better throughput.

@alexpyattaev alexpyattaev self-requested a review October 31, 2025 06:36
@cavemanloverboy
Copy link
Author

cavemanloverboy commented Oct 31, 2025

ok lets try to remove this 1 allo per packet and do better. how do you wanna do it? PacketBatch::Single? I can also change the channel to send a BytesPacket instead. Then, one allocation per batch can occur in sigverify's streamer recv. I kind of like the latter more, we can pre allo for ≈128 per batch or something.

@t-nelson
Copy link

probably new PacketBatch variant first, then remove PacketBatch in a followup. imagine the latter will dirty the diff a lot

@cavemanloverboy
Copy link
Author

@lijunwangs can you rerun bench?

@alexpyattaev
Copy link

I can also change the channel to send a BytesPacket instead.

This sounds good. Less layers of abstraction is always better.

@cavemanloverboy
Copy link
Author

cavemanloverboy commented Oct 31, 2025

should we delete the gpu code and PacketBatch in a followup? hehe.

then, sigverify just uses a Vec<BytesPacket>

@t-nelson
Copy link

should we delete the gpu code and PacketBatch in a followup?

two followups. yes

@cavemanloverboy cavemanloverboy marked this pull request as ready for review November 1, 2025 15:51
@diman-io
Copy link

diman-io commented Nov 2, 2025

Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary? I don’t know how par_iter works, but wouldn’t it be better to have several threads concurrently consuming one packet at a time from a queue?

UPD.

are they still actually necessary?

I mean a vector

@alexpyattaev
Copy link

Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary?

Not necessary. Technically pulling packets off the channel one at a time has its oen overheads, so batching does help with that, but if we can almost never fill a batch it is useless in that role.

I don’t know how par_iter works

It splits a vector into a bunch of slices, (one per thread in the pool) and gives each slice to a thread. Once a thread is done with its own slice, it will try to "steal" work from other threads that are still busy.

@steviez
Copy link

steviez commented Nov 3, 2025

Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary?

Not necessary. Technically pulling packets off the channel one at a time has its oen overheads, so batching does help with that, but if we can almost never fill a batch it is useless in that role.

Yes, the PacketBatch type was necessary for the GPU code instead of just a Vec<Packet>. PacketBatch used to just be a PinnedVec<Packet> where we see the definition of PinnedVec here:

// A vector wrapper where the underlying memory can be
// page-pinned. Controlled by flags in case user only wants
// to pin in certain circumstances.
#[cfg_attr(feature = "frozen-abi", derive(AbiExample))]
#[derive(Debug, Default, Serialize, Deserialize)]
pub struct PinnedVec<T: Default + Clone + Sized> {
x: Vec<T>,
pinned: bool,
pinnable: bool,
#[serde(skip)]
recycler: Weak<RecyclerX<PinnedVec<T>>>,
}

PacketBatch has since become an enum; I don't recall if that other type is GPU friendly or not. But as mentioned, GPU has been on our potential kill list for a while (#3817 is a previous attempt that stalled out)

@t-nelson
Copy link

t-nelson commented Nov 3, 2025

PacketBatch has since become an enum; I don't recall if that other type is GPU friendly or not. But as mentioned, GPU has been on our potential kill list for a while (#3817 is a previous attempt that stalled out)

shh! quiet part out loud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants