Skip to content

Conversation

@paolobarbolini
Copy link
Contributor

@paolobarbolini paolobarbolini commented Jan 4, 2024

Tokio 1.36 exposed Receiver::poll_recv_many, which allows polling multiple items at once from an mpsc channel. Considering that our write buffer already has a lenient capacity limit, and the benchmarks show that the reduced overhead is worth it, this PR makes the connection handler poll 16 commands at once instead of just 1.

Benchmark results (Ryzen 5900X)
Gnuplot not found, using plotters backend
nats::publish_throughput/32
                        time:   [86.357 ms 91.881 ms 95.353 ms]
                        thrpt:  [160.02 MiB/s 166.07 MiB/s 176.69 MiB/s]
                 change:
                        time:   [-27.813% -21.773% -15.602%] (p = 0.00 < 0.05)
                        thrpt:  [+18.487% +27.834% +38.529%]
                        Performance has improved.
nats::publish_throughput/1024
                        time:   [170.67 ms 183.75 ms 197.30 ms]
                        thrpt:  [2.4168 GiB/s 2.5951 GiB/s 2.7939 GiB/s]
                 change:
                        time:   [-30.036% -23.320% -15.435%] (p = 0.00 < 0.05)
                        thrpt:  [+18.253% +30.412% +42.931%]
                        Performance has improved.
Benchmarking nats::publish_throughput/8192: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.8s.
nats::publish_throughput/8192
                        time:   [1.0787 s 1.1174 s 1.1481 s]
                        thrpt:  [3.3227 GiB/s 3.4139 GiB/s 3.5363 GiB/s]
                 change:
                        time:   [-16.965% -12.838% -8.4008%] (p = 0.00 < 0.05)
                        thrpt:  [+9.1712% +14.729% +20.431%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

Benchmarking nats::publish_amount/32: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.4s or enable flat sampling.
nats::publish_amount/32 time:   [77.440 ms 88.434 ms 96.639 ms]
                        thrpt:  [5.1739 Melem/s 5.6539 Melem/s 6.4566 Melem/s]
                 change:
                        time:   [-35.253% -26.741% -17.500%] (p = 0.00 < 0.05)
                        thrpt:  [+21.212% +36.502% +54.447%]
                        Performance has improved.
nats::publish_amount/1024
                        time:   [163.45 ms 174.96 ms 187.47 ms]
                        thrpt:  [2.6670 Melem/s 2.8578 Melem/s 3.0590 Melem/s]
                 change:
                        time:   [-35.579% -29.700% -23.342%] (p = 0.00 < 0.05)
                        thrpt:  [+30.449% +42.248% +55.229%]
                        Performance has improved.
Benchmarking nats::publish_amount/8192: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 11.3s.
nats::publish_amount/8192
                        time:   [1.0952 s 1.1479 s 1.1899 s]
                        thrpt:  [420.19 Kelem/s 435.59 Kelem/s 456.55 Kelem/s]
                 change:
                        time:   [-16.233% -11.968% -7.6855%] (p = 0.00 < 0.05)
                        thrpt:  [+8.3254% +13.594% +19.379%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

nats::subscribe_amount/32
                        time:   [279.04 ms 323.40 ms 363.97 ms]
                        thrpt:  [1.3737 Melem/s 1.5461 Melem/s 1.7919 Melem/s]
                 change:
                        time:   [-26.028% -12.368% +2.3029%] (p = 0.14 > 0.05)
                        thrpt:  [-2.2511% +14.114% +35.186%]
                        No change in performance detected.
nats::subscribe_amount/1024
                        time:   [356.16 ms 373.04 ms 389.21 ms]
                        thrpt:  [1.2846 Melem/s 1.3404 Melem/s 1.4038 Melem/s]
                 change:
                        time:   [-10.456% -4.4544% +1.6193%] (p = 0.20 > 0.05)
                        thrpt:  [-1.5935% +4.6621% +11.677%]
                        No change in performance detected.
Benchmarking nats::subscribe_amount/8192: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 16.0s.
nats::subscribe_amount/8192
                        time:   [1.5499 s 1.5847 s 1.6154 s]
                        thrpt:  [309.52 Kelem/s 315.52 Kelem/s 322.61 Kelem/s]
                 change:
                        time:   [-4.5167% -1.9550% +0.2916%] (p = 0.16 > 0.05)
                        thrpt:  [-0.2907% +1.9940% +4.7304%]
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

Benchmarking nats::request_amount/32: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.7s.
nats::request_amount/32 time:   [767.62 ms 783.32 ms 798.11 ms]
                        thrpt:  [12.530 Kelem/s 12.766 Kelem/s 13.027 Kelem/s]
                 change:
                        time:   [-4.2668% -0.2408% +3.9516%] (p = 0.91 > 0.05)
                        thrpt:  [-3.8014% +0.2414% +4.4569%]
                        No change in performance detected.
Benchmarking nats::request_amount/1024: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.4s.
nats::request_amount/1024
                        time:   [789.21 ms 802.72 ms 817.36 ms]
                        thrpt:  [12.234 Kelem/s 12.458 Kelem/s 12.671 Kelem/s]
                 change:
                        time:   [-4.8586% -2.2998% +0.4285%] (p = 0.12 > 0.05)
                        thrpt:  [-0.4267% +2.3539% +5.1067%]
                        No change in performance detected.
Benchmarking nats::request_amount/8192: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.6s.
nats::request_amount/8192
                        time:   [843.46 ms 858.24 ms 872.76 ms]
                        thrpt:  [11.458 Kelem/s 11.652 Kelem/s 11.856 Kelem/s]
                 change:
                        time:   [-2.2396% +1.3327% +5.8574%] (p = 0.57 > 0.05)
                        thrpt:  [-5.5333% -1.3151% +2.2909%]
                        No change in performance detected.

@Jarema
Copy link
Member

Jarema commented Jan 4, 2024

That is really nice improvement @paolobarbolini !
Is there timeline when it will be released?

@paolobarbolini
Copy link
Contributor Author

From what I can gather from the latest releases it shouldn't take more than a few weeks.
This is what I got from the tokio-rs Discord 😄 https://discord.com/channels/500028886025895936/500336346770964480/1192423210780737578

@paolobarbolini paolobarbolini marked this pull request as ready for review January 30, 2024 15:18
@paolobarbolini
Copy link
Contributor Author

Tokio v1.36 is out

@paolobarbolini
Copy link
Contributor Author

Tokio v1.36 is out

Ok it's not. Just a PR 😞
tokio-rs/tokio#6312

@paolobarbolini
Copy link
Contributor Author

Now it's out for real

Copy link
Member

@Jarema Jarema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome to leverage this new API and gain 30% performance!

Thanks @paolobarbolini

Just some minor questions.

Copy link
Member

@Jarema Jarema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Jarema Jarema merged commit 00f1d56 into nats-io:main Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants