-
Notifications
You must be signed in to change notification settings - Fork 796
ibrl: remove packet batch sender thread from streamer #8786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ibrl: remove packet batch sender thread from streamer #8786
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8786 +/- ##
========================================
Coverage 83.2% 83.2%
========================================
Files 863 863
Lines 374306 374157 -149
========================================
- Hits 311456 311351 -105
+ Misses 62850 62806 -44 🚀 New features to boost your workflow:
|
|
With the change Received txns count: total: 1496103, rate 164939/s Without (against cbc2ffb): Received txns count: total: 1055977, rate 160998/s It seems to have better throughput. |
|
ok lets try to remove this 1 allo per packet and do better. how do you wanna do it? |
|
probably new PacketBatch variant first, then remove PacketBatch in a followup. imagine the latter will dirty the diff a lot |
|
@lijunwangs can you rerun bench? |
This sounds good. Less layers of abstraction is always better. |
|
should we delete the gpu code and then, sigverify just uses a |
two followups. yes |
|
Maybe a dumb question. I always thought packet batches are needed here because of the GPU abstraction - if there’s no GPU, are they still actually necessary? I don’t know how par_iter works, but wouldn’t it be better to have several threads concurrently consuming one packet at a time from a queue? UPD.
I mean a vector |
Not necessary. Technically pulling packets off the channel one at a time has its oen overheads, so batching does help with that, but if we can almost never fill a batch it is useless in that role.
It splits a vector into a bunch of slices, (one per thread in the pool) and gives each slice to a thread. Once a thread is done with its own slice, it will try to "steal" work from other threads that are still busy. |
Yes, the agave/perf/src/cuda_runtime.rs Lines 54 to 65 in 4dc5b0e
|
shh! quiet part out loud |
Problem
#8356 removed the coalescing in the tpu, which removes the need for a batching thread. now, any batching is done by sigverify.
Summary of Changes
removes packet batch sender, sending from streamer to sigverify directly.
this also removes 3 threads from the validator (quic tpu, quic tpu fwd, quic tpu vote).
This is a draft pr. There is a single allocation done per packet which i am not a fan of. We could get around this by introducing a
PacketBatch::Single(BytesPacket)but i don't wanna make too many changes without getting out this rough draft and collecting feedback.