Skip to content

thread_local! access hits SIGILL on powerpc-unknown-linux-gnu #145693

@cuviper

Description

@cuviper

From rayon-rs/rayon#1268

While the original problem involved a thread_local! in rayon-core, I was able to reproduce the problem with just channels.

fn main() {
    let (tx, rx) = std::sync::mpsc::channel();
    std::thread::spawn(move || {
        tx.send(0).unwrap();
    });
    assert_eq!(rx.recv(), Ok(0));
}

I expected to see this happen: success!

Instead, this happened: SIGILL around here.

Meta

The rayon issue was fickle to reproduce, as some configurations only failed with the new version as they reported, and other configurations failed even with older versions of rayon-core and toolchain.

With the reproducer above, cargo bisect-rustc brought me to nightly-2025-02-24 (compare), most likely for this modification to the library code, using const TLS init. It happens that the new rayon-core version also made that change on its TLS in question. However, I think this is a red herring since some configurations still reproduced on older versions.

Looking at the assembly of the crash:

000261c0 <_ZN4core3ops8function6FnOnce9call_once17heccb642285848408E.llvm.5489154849748212803>:
   261c0:	48 00 00 09 	bl      261c8 <_ZN4core3ops8function6FnOnce9call_once17heccb642285848408E.llvm.5489154849748212803+0x8>
   261c4:	00 13 9c 10 	.long 0x139c10
   261c8:	7c 68 02 a6 	mflr    r3
   261cc:	80 83 00 00 	lwz     r4,0(r3)
   261d0:	7c 64 1a 14 	add     r3,r4,r3
   261d4:	3c 62 00 00 	addis   r3,r2,0
   261d8:	38 63 90 18 	addi    r3,r3,-28648
   261dc:	4e 80 00 20 	blr

I worked out that this bl writes the link register (LR) for the TLS code to get that inline .long. However, since the original LR wasn't saved and restored, the blr that should have returned from this function instead goes to the address of that .long too, which isn't a valid instruction --> SIGILL!

Hopefully this will be fixed by llvm/llvm-project#154654, and we can close this when we've upgraded LLVM.

I'm not sure we can write a useful Rust regression test though, because the problem goes away as soon as the surrounding function makes any real bl function call nearby, as it will save/restore LR for that. The reproducer above works (fails) right now, but different inlining or changes to the channel implementation might break this as a regression test. I also tried more direct thread_local! reproducers, but couldn't get it to generate the same kind of relocation code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.O-PowerPCTarget: PowerPC processorsT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.llvm-fixed-upstreamIssue expected to be fixed by the next major LLVM upgrade, or backported fixes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions