-
Notifications
You must be signed in to change notification settings - Fork 14k
Ignore intrinsic calls in cross-crate-inlining cost model #145910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
| if let Some((fn_def_id, _)) = func.const_fn_def() { | ||
| if self.tcx.intrinsic(fn_def_id).is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this would benefit from combining into one if using either let-chaining or is_some_and.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (e8d1f9d): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.1%, secondary -3.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 0.8%, secondary 0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 466.645s -> 466.461s (-0.04%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@bors try cancel |
|
Try build cancelled. Cancelled workflows: |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ignore intrinsic calls in cross-crate-inlining cost model
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (0f272e5): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.0%, secondary -2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.5%, secondary 0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 468.329s -> 467.725s (-0.13%) |
5dc2b2e to
53bb74b
Compare
53bb74b to
ab91a63
Compare
|
@Kobzol I figured out this case, see the updated PR description |
|
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
|
Thanks a lot for working on this. I wonder how much we should extend this to simple wrapper functions that do almost nothing except calling another function. For perf triage: the perf run results in CGU shuffling, with wild changes in perf. Without this CGU effect, this PR is a net improvement. @bors r+ |
I tried that before in #116898, and at a glance it has the same CGU shuffling problem and needs the same comparison trick I used here. Also that PR is 2 years old so the perf might be completely different now. |
|
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 2f3f27b (parent) -> a09fbe2 (this PR) Test differencesShow 1 test diff1 doctest diff were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard a09fbe2c8372643a27a8082236120f95ed4e6bba --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (a09fbe2): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 2.1%, secondary 3.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 1.6%, secondary 2.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 468.032s -> 468.082s (0.01%) |
Starting with Rust 1.91.0 (released 2025-10-30), in upstream commit
ab91a63d403b ("Ignore intrinsic calls in cross-crate-inlining cost model")
[1][2], `bindings.o` stops containing DWARF debug information because the
`Default` implementations contained `write_bytes()` calls which are now
ignored in that cost model (note that `CLIPPY=1` does not reproduce it).
This means `gendwarfksyms` complains:
RUSTC L rust/bindings.o
error: gendwarfksyms: process_module: dwarf_get_units failed: no debugging information?
For the moment, conditionally skip `gendwarfksyms` for Rust >= 1.91.0.
Cc: [email protected] # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Reported-by: Haiyue Wang <[email protected]>
Closes: https://lore.kernel.org/rust-for-linux/[email protected]/
Link: rust-lang/rust@ab91a63 [1]
Link: rust-lang/rust#145910 [2]
Signed-off-by: Miguel Ojeda <[email protected]>
Starting with Rust 1.91.0 (released 2025-10-30), in upstream commit
ab91a63d403b ("Ignore intrinsic calls in cross-crate-inlining cost model")
[1][2], `bindings.o` stops containing DWARF debug information because the
`Default` implementations contained `write_bytes()` calls which are now
ignored in that cost model (note that `CLIPPY=1` does not reproduce it).
This means `gendwarfksyms` complains:
RUSTC L rust/bindings.o
error: gendwarfksyms: process_module: dwarf_get_units failed: no debugging information?
There are several alternatives that would work here: conditionally
skipping in the cases needed (but that is subtle and brittle), forcing
DWARF generation with e.g. a dummy `static` (ugly and we may need to
do it in several crates), skipping the call to the tool in the Kbuild
command when there are no exports (fine) or teaching the tool to do so
itself (simple and clean).
Thus do the last one: don't attempt to process files if we have no symbol
versions to calculate.
[ I used the commit log of my patch linked below since it explained the
root issue and expanded it a bit more to summarize the alternatives.
- Miguel ]
Cc: [email protected] # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Reported-by: Haiyue Wang <[email protected]>
Closes: https://lore.kernel.org/rust-for-linux/[email protected]/
Suggested-by: Miguel Ojeda <[email protected]>
Link: https://lore.kernel.org/rust-for-linux/CANiq72nKC5r24VHAp9oUPR1HVPqT+=0ab9N0w6GqTF-kJOeiSw@mail.gmail.com/
Link: rust-lang/rust@ab91a63 [1]
Link: rust-lang/rust#145910 [2]
Signed-off-by: Sami Tolvanen <[email protected]>
Signed-off-by: Miguel Ojeda <[email protected]>
Starting with Rust 1.91.0 (released 2025-10-30), in upstream commit
ab91a63d403b ("Ignore intrinsic calls in cross-crate-inlining cost model")
[1][2], `bindings.o` stops containing DWARF debug information because the
`Default` implementations contained `write_bytes()` calls which are now
ignored in that cost model (note that `CLIPPY=1` does not reproduce it).
This means `gendwarfksyms` complains:
RUSTC L rust/bindings.o
error: gendwarfksyms: process_module: dwarf_get_units failed: no debugging information?
There are several alternatives that would work here: conditionally
skipping in the cases needed (but that is subtle and brittle), forcing
DWARF generation with e.g. a dummy `static` (ugly and we may need to
do it in several crates), skipping the call to the tool in the Kbuild
command when there are no exports (fine) or teaching the tool to do so
itself (simple and clean).
Thus do the last one: don't attempt to process files if we have no symbol
versions to calculate.
[ I used the commit log of my patch linked below since it explained the
root issue and expanded it a bit more to summarize the alternatives.
- Miguel ]
Cc: [email protected] # Needed in 6.17.y.
Reported-by: Haiyue Wang <[email protected]>
Closes: https://lore.kernel.org/rust-for-linux/[email protected]/
Suggested-by: Miguel Ojeda <[email protected]>
Link: https://lore.kernel.org/rust-for-linux/CANiq72nKC5r24VHAp9oUPR1HVPqT+=0ab9N0w6GqTF-kJOeiSw@mail.gmail.com/
Link: rust-lang/rust@ab91a63 [1]
Link: rust-lang/rust#145910 [2]
Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Haiyue Wang <[email protected]>
Reviewed-by: Alice Ryhl <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
I noticed in a side project that a function which just compares to
[u64; 2]for equality is not cross-crate-inlinable. That was surprising to me because I didn't think that code contained a function call, but of course our array comparisons are lowered to an intrinsic. Intrinsic calls don't make a function no longer a leaf, so it makes sense to add this as an exception to the "only leaves" cross-crate-inline heuristic.This is the useful compare link: https://perf.rust-lang.org/compare.html?start=7cb1a81145a739c4fd858abe3c624ce8e6e5f9cd&end=c3f0a64dbf9fba4722dacf8e39d2fe00069c995e&stat=instructions%3Au because it disables CGU merging in both commits, so effects that cause changes in the sysroot to perturb partitioning downstream are excluded. Perturbations to what is and isn't cross-crate-inlinable in the sysroot has chaotic effects on what items are in which CGUs after merging. It looks like before this PR by sheer luck some of the CGUs dirtied by the patch in eza incr-unchanged happened to be merged together, and with this PR they are not.
The perf runs on this PR point to a nice runtime performance improvement.