-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
From rayon-rs/rayon#1268
While the original problem involved a thread_local!
in rayon-core
, I was able to reproduce the problem with just channels.
fn main() {
let (tx, rx) = std::sync::mpsc::channel();
std::thread::spawn(move || {
tx.send(0).unwrap();
});
assert_eq!(rx.recv(), Ok(0));
}
I expected to see this happen: success!
Instead, this happened: SIGILL around here.
Meta
The rayon issue was fickle to reproduce, as some configurations only failed with the new version as they reported, and other configurations failed even with older versions of rayon-core
and toolchain.
With the reproducer above, cargo bisect-rustc
brought me to nightly-2025-02-24
(compare), most likely for this modification to the library code, using const
TLS init. It happens that the new rayon-core
version also made that change on its TLS in question. However, I think this is a red herring since some configurations still reproduced on older versions.
Looking at the assembly of the crash:
000261c0 <_ZN4core3ops8function6FnOnce9call_once17heccb642285848408E.llvm.5489154849748212803>:
261c0: 48 00 00 09 bl 261c8 <_ZN4core3ops8function6FnOnce9call_once17heccb642285848408E.llvm.5489154849748212803+0x8>
261c4: 00 13 9c 10 .long 0x139c10
261c8: 7c 68 02 a6 mflr r3
261cc: 80 83 00 00 lwz r4,0(r3)
261d0: 7c 64 1a 14 add r3,r4,r3
261d4: 3c 62 00 00 addis r3,r2,0
261d8: 38 63 90 18 addi r3,r3,-28648
261dc: 4e 80 00 20 blr
I worked out that this bl
writes the link register (LR) for the TLS code to get that inline .long
. However, since the original LR wasn't saved and restored, the blr
that should have returned from this function instead goes to the address of that .long
too, which isn't a valid instruction --> SIGILL!
Hopefully this will be fixed by llvm/llvm-project#154654, and we can close this when we've upgraded LLVM.
I'm not sure we can write a useful Rust regression test though, because the problem goes away as soon as the surrounding function makes any real bl
function call nearby, as it will save/restore LR for that. The reproducer above works (fails) right now, but different inlining or changes to the channel implementation might break this as a regression test. I also tried more direct thread_local!
reproducers, but couldn't get it to generate the same kind of relocation code.