-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchP-highHigh priorityHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.regression-untriagedUntriaged performance or correctness regression.Untriaged performance or correctness regression.
Description
Code
I tried this code:
use std::{iter::Peekable, str::CharIndices};
use std::sync::LazyLock as Lazy;
use memchr::memmem::Finder;
use itertools::Itertools;
struct Foo<'a> {
txt: &'a str,
ci: Peekable<CharIndices<'a>>,
}
impl<'a> Foo<'a> {
fn new(txt: &'a str) -> Self {
let ci = txt.char_indices().peekable();
Self { txt, ci }
}
fn next(&mut self) -> Option<(usize, usize)> {
static FI: Lazy<Finder> = Lazy::new(|| Finder::new(b"\n"));
// XXX: switching between these two lines makes a difference
let &(start, _) = self.ci.peek()?;
//let start = self.ci.peek().map(|&(idx, _)| idx)?;
let matches = FI
.find_iter(&self.txt.as_bytes()[start..])
.map(|idx| idx + start);
for next_match in matches {
if self.txt.is_char_boundary(next_match) {
self.ci.by_ref()
//.take_while(|&(idx, _)| idx < next_match)
.peeking_take_while(|&(idx, _)| idx <= next_match)
.for_each(drop);
return Some((start, next_match))
}
}
self.ci.by_ref().for_each(drop);
Some((start, self.txt.len()))
}
}
fn main() {
let s = "1".repeat(20_000) + "\n";
let sn = s.repeat(200_000);
let mut v = Foo::new(&sn);
let mut p = Vec::with_capacity(200_000);
while let Some(r) = v.next() {
p.push(r);
}
}
Cargo.toml
[package]
name = "tt-fluctuations"
version = "0.1.0"
edition = "2024"
[dependencies]
itertools = "0.14.0"
memchr = "2.7.4"
[profile.release]
codegen-units = 1
lto = true
strip = true
Cargo.lock
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4
[[package]]
name = "either"
version = "1.15.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719"
[[package]]
name = "itertools"
version = "0.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2b192c782037fadd9cfa75548310488aabdbf3d2da73885b31bd0abd03351285"
dependencies = [
"either",
]
[[package]]
name = "memchr"
version = "2.7.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "78ca9ab1a0babb1e7d5695e3530886289c18cf2f87ec19a575a0abdce112e3a3"
[[package]]
name = "tt-fluctuations"
version = "0.1.0"
dependencies = [
"itertools",
"memchr",
]
Commits used to test:
- 2162e9d (or stable 1.86.0) => before LLVM 20
- ce36a96 (Update to LLVM 20 #135763) => First regression
- 934880f (speed up
String::push
andString::insert
#124810) => Second regression
Besides the two direct regressions. The fact that alternating between these two lines may cause
a big difference in performance is curious:
let &(start, _) = self.ci.peek()?; // called without_map in bench
let start = self.ci.peek().map(|&(idx, _)| idx)?; // with_map
But maybe that's a different issue.
Summary of hyperfine run:
Summary
./2162e9d_with_map ran
1.06 ± 0.01 times faster than ./ce36a96_with_map
1.12 ± 0.01 times faster than ./2162e9d_without_map
1.38 ± 0.01 times faster than ./934880f586f_with_map
1.66 ± 0.01 times faster than ./ce36a96_without_map
1.91 ± 0.01 times faster than ./934880f586f_without_map
Full numbers
Benchmark 1: ./2162e9d_with_map
Time (mean ± σ): 3.894 s ± 0.014 s [User: 3.202 s, System: 0.682 s]
Range (min … max): 3.876 s … 3.909 s 4 runs
Benchmark 2: ./2162e9d_without_map
Time (mean ± σ): 4.380 s ± 0.034 s [User: 3.686 s, System: 0.683 s]
Range (min … max): 4.346 s … 4.419 s 4 runs
Benchmark 3: ./934880f586f_with_map
Time (mean ± σ): 5.388 s ± 0.020 s [User: 4.702 s, System: 0.671 s]
Range (min … max): 5.369 s … 5.415 s 4 runs
Benchmark 4: ./934880f586f_without_map
Time (mean ± σ): 7.448 s ± 0.018 s [User: 6.758 s, System: 0.666 s]
Range (min … max): 7.423 s … 7.465 s 4 runs
Benchmark 5: ./ce36a96_with_map
Time (mean ± σ): 4.128 s ± 0.016 s [User: 3.448 s, System: 0.666 s]
Range (min … max): 4.111 s … 4.149 s 4 runs
Benchmark 6: ./ce36a96_without_map
Time (mean ± σ): 6.448 s ± 0.007 s [User: 5.734 s, System: 0.696 s]
Range (min … max): 6.441 s … 6.457 s 4 runs
Summary
./2162e9d_with_map ran
1.06 ± 0.01 times faster than ./ce36a96_with_map
1.12 ± 0.01 times faster than ./2162e9d_without_map
1.38 ± 0.01 times faster than ./934880f586f_with_map
1.66 ± 0.01 times faster than ./ce36a96_without_map
1.91 ± 0.01 times faster than ./934880f586f_without_map
So building with current nightly
and
let &(start, _) = self.ci.peek()?;
is almost twice as slow than building with stable
and using
let start = self.ci.peek().map(|&(idx, _)| idx)?;
Version it worked on
It most recently worked on: 2162e9d
Version with regression
First regression
rustc --version --verbose
:
rustc 1.87.0-nightly (ce36a966c 2025-02-17)
binary: rustc
commit-hash: ce36a966c79e109dabeef7a47fe68e5294c6d71e
commit-date: 2025-02-17
host: x86_64-unknown-linux-gnu
release: 1.87.0-nightly
LLVM version: 20.1.0
Second regression
rustc --version --verbose
:
rustc 1.88.0-nightly (934880f58 2025-04-09)
binary: rustc
commit-hash: 934880f586f6ac1f952c7090e2a943fcd7775e7b
commit-date: 2025-04-09
host: x86_64-unknown-linux-gnu
release: 1.88.0-nightly
LLVM version: 20.1.2
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchP-highHigh priorityHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.regression-untriagedUntriaged performance or correctness regression.Untriaged performance or correctness regression.