-
Notifications
You must be signed in to change notification settings - Fork 13.8k
interpret: copy_provenance: avoid large intermediate buffer for large repeat counts #146331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rustbot has assigned @petrochenkov. Use |
@bors try |
This comment has been minimized.
This comment has been minimized.
interpret: copy_provenance: avoid large intermediate buffer for large repeat counts
This comment has been minimized.
This comment has been minimized.
self.ptrs.insert_presorted( | ||
copy.ptrs.iter().map(|&(offset, reloc)| (shift_offset(i, offset), reloc)), | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one change that may cost us performance... if there's a bunch of provenance after the range we are copying into, but none in the given range (i.e. we are hitting the fast-path in insert_presorted), it'll be moved around each time.
We'd need a single iterator yielding the entire repeated range to avoid this... which I guess is possible. Maybe I should try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's pretty common for such operations to only be used when the rest is uninit. I guess that will still pessimize the other cases, but without more benchmarks I don't know if we should try anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the 2nd commit does implement the "single iterator" strategy to avoid copying N times for an array of size N.
insert_presorted
could still have its slowpath improved by a lot, but that's orthogonal to this PR.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (6215de8): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 4.1%, secondary 1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 0.1%, secondary 2.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 466.876s -> 466.292s (-0.13%) |
☔ The latest upstream changes (presumably #146360) made this pull request unmergeable. Please resolve the merge conflicts. |
317df91
to
d098505
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
interpret: copy_provenance: avoid large intermediate buffer for large repeat counts
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (cd436dc): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResults (primary -2.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 467.841s -> 468.775s (0.20%) |
This looks slightly better than the first run (which only had the first commit), though that might be noise. |
d098505
to
7abbc9c
Compare
r? compiler |
r? @oli-obk |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the things we do for perf 😆
pub fn insert_presorted( | ||
&mut self, | ||
// We require `TrustedLen` to ensure that the `splice` below is actually efficient. | ||
mut elements: impl Iterator<Item = (K, V)> + DoubleEndedIterator + TrustedLen, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use IntoIterator<Item = (K, V), Iter: DoubleEndedIterator + TrustedLen>
to avoid needing to change callers, but since it's mostly tests, doesn't really matter
@bors r+ |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing ce6daf3 (parent) -> 5d1b897 (this PR) Test differencesShow 3 test diffs3 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 5d1b897a07dc30d810dd541795125c1c216266c7 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (5d1b897): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -9.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -3.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 472.939s -> 471.616s (-0.28%) |
Copying provenance worked in this odd way where the "preparation" phase (which is supposed to just extract the necessary information from the source range) already did all the work of repeating the result N times for the target range. This was needed to use the existing
insert_presorted
function onSortedMap
.This PR generalizes
insert_presorted
so that we can avoid this odd structure on copy-provenance, and maybe even improve performance.