Skip to content

Conversation

@AlexanderPortland
Copy link
Contributor

@AlexanderPortland AlexanderPortland commented Jul 24, 2025

Serializing and writing goto binaries is a serious bottleneck for the Kani compiler, with profiling indicating that it takes around half of the total codegen time.

This PR introduces a variable size thread pool specifically for serializing goto binaries and writing them to disk. Now, instead of having to do everything itself, the main compiler thread just has collect all the data needed for serialization (requiring relatively inexpensive clones of some local state) and dispatch it to the thread pool's work queue. This allows the main thread to move on quickly while the pool's worker threads handle generating the binaries off of the critical path of compilation.

Results

The table below shows wall clock end-to-end compile times before and after this change. This metric corresponds to how long a user would have to wait for Kani's compilation to finish before verification can begin.

benchmark compile time before compile time now change
standard library (commit 177d0fd) 328s 233s -95s (-29%)
prost crate (from #2505) 325s 177s -148s (-45%)

from local runs on a 12 core M3 Mac

Resolves #2505.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@github-actions github-actions bot added Z-EndToEndBenchCI Tag a PR to run benchmark CI Z-CompilerBenchCI Tag a PR to run benchmark CI labels Jul 24, 2025
@tautschnig
Copy link
Member

Running kani autoharness -Z autoharness --list ... for the standard library this command previously took 2:05 hours of wall-clock time (and 11300 seconds of CPU time) on an r5.metal. With the changes in this PR and increasing NUM_FILE_EXPORT_THREADS to 32 (can we please use the value of --jobs for that?) the same command took 1:25 hours (approximately 25% improvement) despite using 13000 seconds of CPU time (some overhead is not unexpected).

@AlexanderPortland
Copy link
Contributor Author

I also don't think it's unexpected that this introduced regressions on the compile-timer-short CI job. The overhead of spinning up threads and communicating with them is likely not worth it for a workspace with a single proof as the main compiler thread could just as easily write that one file itself.

@tautschnig Does it make sense to cap the number of threads used (taken from the value of --jobs) at # of proof harnesses - 1? For workloads of just a single proof this will force the main thread to just handle everything itself and potentially negate this regression.

@tautschnig
Copy link
Member

@tautschnig Does it make sense to cap the number of threads used (taken from the value of --jobs) at # of proof harnesses - 1? For workloads of just a single proof this will force the main thread to just handle everything itself and potentially negate this regression.

Yes, this capping sounds like a good idea!

@AlexanderPortland AlexanderPortland force-pushed the parallel-export branch 2 times, most recently from afd4f52 to ba2a7e8 Compare July 28, 2025 23:21
* thread pools are now dynamically sized
* that size is set based on a const var capped by the # of harnesses
@AlexanderPortland
Copy link
Contributor Author

AlexanderPortland commented Jul 29, 2025

I ended up implementing the full calculation for the size of the thread pool as something along the lines of min(# of harnesses - 1, # of CPU cores - 1, SOME_MAX_SENSIBLE_COUNT) rather than taking it directly from the --jobs Kani argument.

Our compiler generates goto files at a fixed rate (currently around twice as fast as they can be exported), so adding any more than 3-4 threads to the export pool seems to have no real performance benefit as the extra threads are mostly sitting around without work to do.

Since the internal performance of our compiler isn't known by Kani users, it feels like this might be better kept as an internal calculation rather than taking it as an input. Does that seem alright @tautschnig?

@AlexanderPortland AlexanderPortland marked this pull request as ready for review July 29, 2025 23:01
@AlexanderPortland AlexanderPortland requested a review from a team as a code owner July 29, 2025 23:01
@carolynzech
Copy link
Contributor

The inspiration from this change came from #2505, specifically #2505 (comment). @AlexanderPortland can you check if these changes resolve the issue? Of course resolution here is subjective, since we have to decide what reasonable performance is, but let's do some measurements and see if this version of Kani builds in a more reasonable amount of time.

(This repo may serve as a useful benchmark for some of the other compiler perf improvements you've made as well, in addition to the standard library).

Copy link
Member

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, but would appreciate attention to Carolyn's comment with regard to #2505.

@AlexanderPortland
Copy link
Contributor Author

Just ran the tests and, on my local machine, this change brings the prost crate's end to end compile time down 45%. I've updated the PR description with the full #s from that benchmark and the standard library.

@tautschnig tautschnig added this pull request to the merge queue Aug 4, 2025
Merged via the queue into model-checking:main with commit 8adc279 Aug 4, 2025
16 of 18 checks passed
@AlexanderPortland AlexanderPortland deleted the parallel-export branch August 4, 2025 15:52
github-merge-queue bot pushed a commit that referenced this pull request Aug 7, 2025
from the autogenerated : 

## What's Changed
* Ensure that contract closures are FnOnce by @vonaka in
#4151
* Adjust sized hierarchy for Kani's memory predicates by @tautschnig in
#4193
* Update to Rust edition 2024 by @tautschnig in
#4197
* `ptr_offset_from`: Replace arithmetic over pointers by offset
arithmetic by @tautschnig in
#4180
* Automatic cargo update to 2025-07-07 by @github-actions[bot] in
#4208
* Bump tests/perf/s2n-quic from `b8f8cca` to `8715fdf` by
@dependabot[bot] in #4209
* Upgrade Rust toolchain to 2025-07-04 by @tautschnig in
#4199
* Upgrade Rust toolchain to 2025-07-10 by @thanhnguyen-aws in
#4215
* Update CBMC dependency to 6.7.1 by @tautschnig in
#4178
* Split compiler flags to avoid dependency recompilation by
@AlexanderPortland in #4211
* Fix the bug that assign clause cannot be inferred for the inner loop
of nested loops by @thanhnguyen-aws in
#4179
* Upgrade Rust toolchain to 2025-07-11 by @thanhnguyen-aws in
#4219
* Automatic toolchain upgrade to nightly-2025-07-12 by
@github-actions[bot] in #4222
* Fix bug: `goto-cc` crash when there are two quantifers in one proof by
@thanhnguyen-aws in #4221
* Automatic toolchain upgrade to nightly-2025-07-13 by
@github-actions[bot] in #4223
* Automatic cargo update to 2025-07-14 by @github-actions[bot] in
#4224
* Cleanup links to issues that have been addressed by @tautschnig in
#4200
* Selectively enable and fix (slow) Tokio tests by @tautschnig in
#4203
* Bump tests/perf/s2n-quic from `32ba87d` to `1cbd879` by
@dependabot[bot] in #4227
* Implement support for Cargo.toml's default-members by @tautschnig in
#4201
* Do not invoke memset with count of zero by @tautschnig in
#4205
* Support bitwuzla, cvc5, z3 as solver attribute values by @tautschnig
in #4218
* Use CBMC's shuffle_vector expression by @tautschnig in
#4204
* Move tests from slow/kani back to regular suite by @tautschnig in
#4202
* Automatic toolchain upgrade to nightly-2025-07-14 by
@github-actions[bot] in #4225
* Enable GitHub Linux/Arm runners in CI by @tautschnig in
#3841
* Automatic cargo update to 2025-07-21 by @github-actions[bot] in
#4231
* Skip codegen for unneeded harnesses by @AlexanderPortland in
#4213
* Strongly type differing compiler args for clarity by
@AlexanderPortland in #4220
* Remove StableMIR ICE workaround by @carolynzech in
#4235
* Fix bug: Kani unwinds loops with contract in generic function (with -Z
loop-contracts) by @thanhnguyen-aws in
#4232
* Automatic cargo update to 2025-07-28 by @github-actions[bot] in
#4238
* Bump tests/perf/s2n-quic from `1cbd879` to `4938450` by
@dependabot[bot] in #4242
* Upgrade Rust toolchain to 2025-07-21 by @tautschnig in
#4241
* Remove `pretty_ty` and use rustc_public's formatter instead by
@tautschnig in #4243
* Upgrade Rust toolchain to 2025-07-24 by @tautschnig in
#4244
* Documentation cleanup of UB detected by Kani by @tautschnig in
#4245
* Upgrade Rust toolchain to 2025-07-29 by @tautschnig in
#4247
* Automatic toolchain upgrade to nightly-2025-07-30 by
@github-actions[bot] in #4253
* Add unstable option prove-safety-only by @tautschnig in
#4239
* Set bits_per_byte in byte_extract expressions by @tautschnig in
#4255
* `KaniAttributes` Path Resolution Refactor by @carolynzech in
#4249
* Automatic toolchain upgrade to nightly-2025-07-31 by
@github-actions[bot] in #4256
* Support contracts & stubs in trait implementations (partial fix) by
@carolynzech in #4250
* [Breaking Changes] Remove unstable list feature and default memory
checks by @carolynzech in
#4258
* Upgrade Rust toolchain to 2025-08-01 by @tautschnig in
#4261
* Autoharness: add support for references by @tautschnig in
#4234
* Turn off debug assertions under `--prove-safety-only` by @tautschnig
in #4262
* Automatic toolchain upgrade to nightly-2025-08-02 by
@github-actions[bot] in #4264
* Automatic toolchain upgrade to nightly-2025-08-03 by
@github-actions[bot] in #4265
* Automatic cargo update to 2025-08-04 by @github-actions[bot] in
#4267
* Automatic toolchain upgrade to nightly-2025-08-04 by
@github-actions[bot] in #4266
* Introduce thread pool for writing goto binaries in parallel by
@AlexanderPortland in #4236
* Major-version update cargo dependencies by @tautschnig in
#4240
* Bump tests/perf/s2n-quic from `4938450` to `8f510f0` by
@dependabot[bot] in #4270
* Automatic toolchain upgrade to nightly-2025-08-05 by
@github-actions[bot] in #4271
* Automatic toolchain upgrade to nightly-2025-08-06 by
@github-actions[bot] in #4272
* Avoid updating irrelevant symbols when handling quantifiers by
@AlexanderPortland in #4268
* Lazily evaluate debug info by @AlexanderPortland in
#4269
* Clone a template `BodyTransformer` to avoid re-initialization by
@AlexanderPortland in #4259
* Ensuring that MIR constants are marked as static consts by @vonaka in
#4233
* Fix release job dependencies by @tautschnig in
#4273

## New Contributors
* @vonaka made their first contribution in
#4151

**Full Changelog**:
kani-0.64.0...kani-0.65.0

---------

Co-authored-by: Zyad Hassan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Z-CompilerBenchCI Tag a PR to run benchmark CI Z-EndToEndBenchCI Tag a PR to run benchmark CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kani-Compiler slow for PROST w/ PropProof.

3 participants