Introduce thread pool for writing goto binaries in parallel #4236

AlexanderPortland · 2025-07-24T22:36:02Z

Serializing and writing goto binaries is a serious bottleneck for the Kani compiler, with profiling indicating that it takes around half of the total codegen time.

This PR introduces a variable size thread pool specifically for serializing goto binaries and writing them to disk. Now, instead of having to do everything itself, the main compiler thread just has collect all the data needed for serialization (requiring relatively inexpensive clones of some local state) and dispatch it to the thread pool's work queue. This allows the main thread to move on quickly while the pool's worker threads handle generating the binaries off of the critical path of compilation.

Results

The table below shows wall clock end-to-end compile times before and after this change. This metric corresponds to how long a user would have to wait for Kani's compilation to finish before verification can begin.

benchmark	compile time before	compile time now	change
standard library (commit `177d0fd`)	328s	233s	-95s (-29%)
`prost` crate (from #2505)	325s	177s	-148s (-45%)

from local runs on a 12 core M3 Mac

Resolves #2505.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

tautschnig · 2025-07-25T15:25:34Z

Running kani autoharness -Z autoharness --list ... for the standard library this command previously took 2:05 hours of wall-clock time (and 11300 seconds of CPU time) on an r5.metal. With the changes in this PR and increasing NUM_FILE_EXPORT_THREADS to 32 (can we please use the value of --jobs for that?) the same command took 1:25 hours (approximately 25% improvement) despite using 13000 seconds of CPU time (some overhead is not unexpected).

AlexanderPortland · 2025-07-25T15:41:02Z

I also don't think it's unexpected that this introduced regressions on the compile-timer-short CI job. The overhead of spinning up threads and communicating with them is likely not worth it for a workspace with a single proof as the main compiler thread could just as easily write that one file itself.

@tautschnig Does it make sense to cap the number of threads used (taken from the value of --jobs) at # of proof harnesses - 1? For workloads of just a single proof this will force the main thread to just handle everything itself and potentially negate this regression.

tautschnig · 2025-07-28T08:56:02Z

@tautschnig Does it make sense to cap the number of threads used (taken from the value of --jobs) at # of proof harnesses - 1? For workloads of just a single proof this will force the main thread to just handle everything itself and potentially negate this regression.

Yes, this capping sounds like a good idea!

* thread pools are now dynamically sized * that size is set based on a const var capped by the # of harnesses

based on available parallelism, # of harnesses, and the rate the compiler can generate goto files

AlexanderPortland · 2025-07-29T23:00:29Z

I ended up implementing the full calculation for the size of the thread pool as something along the lines of min(# of harnesses - 1, # of CPU cores - 1, SOME_MAX_SENSIBLE_COUNT) rather than taking it directly from the --jobs Kani argument.

Our compiler generates goto files at a fixed rate (currently around twice as fast as they can be exported), so adding any more than 3-4 threads to the export pool seems to have no real performance benefit as the extra threads are mostly sitting around without work to do.

Since the internal performance of our compiler isn't known by Kani users, it feels like this might be better kept as an internal calculation rather than taking it as an input. Does that seem alright @tautschnig?

carolynzech · 2025-07-31T15:41:33Z

The inspiration from this change came from #2505, specifically #2505 (comment). @AlexanderPortland can you check if these changes resolve the issue? Of course resolution here is subjective, since we have to decide what reasonable performance is, but let's do some measurements and see if this version of Kani builds in a more reasonable amount of time.

(This repo may serve as a useful benchmark for some of the other compiler perf improvements you've made as well, in addition to the standard library).

tautschnig

I'm happy with this, but would appreciate attention to Carolyn's comment with regard to #2505.

AlexanderPortland · 2025-08-01T20:49:43Z

Just ran the tests and, on my local machine, this change brings the prost crate's end to end compile time down 45%. I've updated the PR description with the full #s from that benchmark and the standard library.

@vonaka

from the autogenerated : ## What's Changed * Ensure that contract closures are FnOnce by @vonaka in #4151 * Adjust sized hierarchy for Kani's memory predicates by @tautschnig in #4193 * Update to Rust edition 2024 by @tautschnig in #4197 * `ptr_offset_from`: Replace arithmetic over pointers by offset arithmetic by @tautschnig in #4180 * Automatic cargo update to 2025-07-07 by @github-actions[bot] in #4208 * Bump tests/perf/s2n-quic from `b8f8cca` to `8715fdf` by @dependabot[bot] in #4209 * Upgrade Rust toolchain to 2025-07-04 by @tautschnig in #4199 * Upgrade Rust toolchain to 2025-07-10 by @thanhnguyen-aws in #4215 * Update CBMC dependency to 6.7.1 by @tautschnig in #4178 * Split compiler flags to avoid dependency recompilation by @AlexanderPortland in #4211 * Fix the bug that assign clause cannot be inferred for the inner loop of nested loops by @thanhnguyen-aws in #4179 * Upgrade Rust toolchain to 2025-07-11 by @thanhnguyen-aws in #4219 * Automatic toolchain upgrade to nightly-2025-07-12 by @github-actions[bot] in #4222 * Fix bug: `goto-cc` crash when there are two quantifers in one proof by @thanhnguyen-aws in #4221 * Automatic toolchain upgrade to nightly-2025-07-13 by @github-actions[bot] in #4223 * Automatic cargo update to 2025-07-14 by @github-actions[bot] in #4224 * Cleanup links to issues that have been addressed by @tautschnig in #4200 * Selectively enable and fix (slow) Tokio tests by @tautschnig in #4203 * Bump tests/perf/s2n-quic from `32ba87d` to `1cbd879` by @dependabot[bot] in #4227 * Implement support for Cargo.toml's default-members by @tautschnig in #4201 * Do not invoke memset with count of zero by @tautschnig in #4205 * Support bitwuzla, cvc5, z3 as solver attribute values by @tautschnig in #4218 * Use CBMC's shuffle_vector expression by @tautschnig in #4204 * Move tests from slow/kani back to regular suite by @tautschnig in #4202 * Automatic toolchain upgrade to nightly-2025-07-14 by @github-actions[bot] in #4225 * Enable GitHub Linux/Arm runners in CI by @tautschnig in #3841 * Automatic cargo update to 2025-07-21 by @github-actions[bot] in #4231 * Skip codegen for unneeded harnesses by @AlexanderPortland in #4213 * Strongly type differing compiler args for clarity by @AlexanderPortland in #4220 * Remove StableMIR ICE workaround by @carolynzech in #4235 * Fix bug: Kani unwinds loops with contract in generic function (with -Z loop-contracts) by @thanhnguyen-aws in #4232 * Automatic cargo update to 2025-07-28 by @github-actions[bot] in #4238 * Bump tests/perf/s2n-quic from `1cbd879` to `4938450` by @dependabot[bot] in #4242 * Upgrade Rust toolchain to 2025-07-21 by @tautschnig in #4241 * Remove `pretty_ty` and use rustc_public's formatter instead by @tautschnig in #4243 * Upgrade Rust toolchain to 2025-07-24 by @tautschnig in #4244 * Documentation cleanup of UB detected by Kani by @tautschnig in #4245 * Upgrade Rust toolchain to 2025-07-29 by @tautschnig in #4247 * Automatic toolchain upgrade to nightly-2025-07-30 by @github-actions[bot] in #4253 * Add unstable option prove-safety-only by @tautschnig in #4239 * Set bits_per_byte in byte_extract expressions by @tautschnig in #4255 * `KaniAttributes` Path Resolution Refactor by @carolynzech in #4249 * Automatic toolchain upgrade to nightly-2025-07-31 by @github-actions[bot] in #4256 * Support contracts & stubs in trait implementations (partial fix) by @carolynzech in #4250 * [Breaking Changes] Remove unstable list feature and default memory checks by @carolynzech in #4258 * Upgrade Rust toolchain to 2025-08-01 by @tautschnig in #4261 * Autoharness: add support for references by @tautschnig in #4234 * Turn off debug assertions under `--prove-safety-only` by @tautschnig in #4262 * Automatic toolchain upgrade to nightly-2025-08-02 by @github-actions[bot] in #4264 * Automatic toolchain upgrade to nightly-2025-08-03 by @github-actions[bot] in #4265 * Automatic cargo update to 2025-08-04 by @github-actions[bot] in #4267 * Automatic toolchain upgrade to nightly-2025-08-04 by @github-actions[bot] in #4266 * Introduce thread pool for writing goto binaries in parallel by @AlexanderPortland in #4236 * Major-version update cargo dependencies by @tautschnig in #4240 * Bump tests/perf/s2n-quic from `4938450` to `8f510f0` by @dependabot[bot] in #4270 * Automatic toolchain upgrade to nightly-2025-08-05 by @github-actions[bot] in #4271 * Automatic toolchain upgrade to nightly-2025-08-06 by @github-actions[bot] in #4272 * Avoid updating irrelevant symbols when handling quantifiers by @AlexanderPortland in #4268 * Lazily evaluate debug info by @AlexanderPortland in #4269 * Clone a template `BodyTransformer` to avoid re-initialization by @AlexanderPortland in #4259 * Ensuring that MIR constants are marked as static consts by @vonaka in #4233 * Fix release job dependencies by @tautschnig in #4273 ## New Contributors * @vonaka made their first contribution in #4151 **Full Changelog**: kani-0.64.0...kani-0.65.0 --------- Co-authored-by: Zyad Hassan <[email protected]>

AlexanderPortland added 2 commits July 24, 2025 15:20

make cbmc string interners thread local

2895e26

add thread pool for writing goto files

a50fbaa

github-actions bot added Z-EndToEndBenchCI Tag a PR to run benchmark CI Z-CompilerBenchCI Tag a PR to run benchmark CI labels Jul 24, 2025

AlexanderPortland force-pushed the parallel-export branch 2 times, most recently from afd4f52 to ba2a7e8 Compare July 28, 2025 23:21

cap thread pool size at # of harnesses

14336e3

* thread pools are now dynamically sized * that size is set based on a const var capped by the # of harnesses

AlexanderPortland force-pushed the parallel-export branch from ba2a7e8 to 14336e3 Compare July 28, 2025 23:23

AlexanderPortland and others added 5 commits July 29, 2025 11:07

Merge branch 'main' into parallel-export

25d33d4

avoid cloning & dropping symbol tables in main compiler thread

a8d29e6

clarify how we calculate # of threads from --jobs

15c9fa3

calculate good size for export thread pool

30547a7

based on available parallelism, # of harnesses, and the rate the compiler can generate goto files

cleanup & doc comment fixes

0ac81c8

AlexanderPortland marked this pull request as ready for review July 29, 2025 23:01

AlexanderPortland requested a review from a team as a code owner July 29, 2025 23:01

AlexanderPortland requested a review from remi-delmas-3000 July 29, 2025 23:02

AlexanderPortland mentioned this pull request Jul 29, 2025

Add heuristic for ordering harness codegen #4248

Open

AlexanderPortland mentioned this pull request Jul 31, 2025

Add heuristic to order harness codegen #4257

Open

Merge branch 'main' into parallel-export

24921ef

tautschnig approved these changes Aug 1, 2025

View reviewed changes

tautschnig assigned AlexanderPortland Aug 1, 2025

AlexanderPortland mentioned this pull request Aug 1, 2025

Kani-Compiler slow for PROST w/ PropProof. #2505

Closed

Merge branch 'main' into parallel-export

89fa76e

tautschnig added this pull request to the merge queue Aug 4, 2025

Merged via the queue into model-checking:main with commit 8adc279 Aug 4, 2025
16 of 18 checks passed

AlexanderPortland deleted the parallel-export branch August 4, 2025 15:52

rajath-mk mentioned this pull request Aug 6, 2025

bump kani vesion 0.65.0 #4274

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce thread pool for writing goto binaries in parallel #4236

Introduce thread pool for writing goto binaries in parallel #4236

Uh oh!

AlexanderPortland commented Jul 24, 2025 •

edited

Loading

Uh oh!

tautschnig commented Jul 25, 2025

Uh oh!

AlexanderPortland commented Jul 25, 2025

Uh oh!

tautschnig commented Jul 28, 2025

Uh oh!

AlexanderPortland commented Jul 29, 2025 •

edited

Loading

Uh oh!

carolynzech commented Jul 31, 2025

Uh oh!

tautschnig left a comment

Uh oh!

AlexanderPortland commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Introduce thread pool for writing goto binaries in parallel #4236

Introduce thread pool for writing goto binaries in parallel #4236

Uh oh!

Conversation

AlexanderPortland commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Uh oh!

tautschnig commented Jul 25, 2025

Uh oh!

AlexanderPortland commented Jul 25, 2025

Uh oh!

tautschnig commented Jul 28, 2025

Uh oh!

AlexanderPortland commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carolynzech commented Jul 31, 2025

Uh oh!

tautschnig left a comment

Choose a reason for hiding this comment

Uh oh!

AlexanderPortland commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexanderPortland commented Jul 24, 2025 •

edited

Loading

AlexanderPortland commented Jul 29, 2025 •

edited

Loading