Skip to content

Conversation

jonas-schievink
Copy link
Contributor

@jonas-schievink jonas-schievink commented Aug 25, 2020

Disclaimer: I have not tested any of this (help wanted, we have no tests in here).

Thanks to @therealprof for nerd-sniping me into oblivion.

Fixes #254
Fixes #194
Fixes #139

Summary

  • Remove the assembly files in favor of a new asm.rs, which uses unstable inline assembly and provides a C ABI interface.
  • Replace the shell scripts by a cargo-xtask.
  • While we're at it, also pre-build artifacts that are compatible with linker-plugin LTO, fixing Support Linker Plugin to inline ASM #139 (again, not tested)

This means that contributors and maintainers just need a nightly Rust compiler installed to run cargo xtask assemble. No binutils, no assembler, no ar, no GCC/Clang, and especially nothing from the godawful Arm servers, fixing #194. You don't even have to install the correct nightly Rust toolchain, cargo xtask does it for you (and installs all the thumb targets too).

@rust-highfive
Copy link

r? @thalesfragoso

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-cortex-m labels Aug 25, 2020
@thalesfragoso
Copy link
Member

Great work, I did some tests here, targeting the plugin-lto. First, it seems that if I use -Clinker-plugin-lto I can't compile with opt-level= "s" (and probably "z"), because I get this error:

rust-lld: error: -plugin-opt=Os: number expected, but got 's'

The following tests used opt-level=3, I wanted to try our Mutex with plugin-lto, so here is the code:

static FLAG: Mutex<Cell<bool>> = Mutex::new(Cell::new(false));

#[entry]
fn main() -> ! {
    while !free(|cs| FLAG.borrow(cs).get()) {}

    unsafe {
        ptr::write_volatile(0x40000000 as *mut u32, 1);
    }
    loop {
        asm::nop();
    }
}

#[exception]
fn SysTick() {
    free(|cs| FLAG.borrow(cs).set(true));
}

With cortex-m from crates.io:

Dump of assembler code for function main:
   0x08000400 <+0>:     push    {r7, lr}
   0x08000402 <+2>:     mov     r7, sp
   0x08000404 <+4>:     bl      0x800040a <plugin_test::__cortex_m_rt_main>
   0x08000408 <+8>:     udf     #254    ; 0xfe
End of assembler dump.

Dump of assembler code for function plugin_test::__cortex_m_rt_main:
   0x0800040a <+0>:     push    {r7, lr}
   0x0800040c <+2>:     mov     r7, sp
   0x0800040e <+4>:     movw    r5, #0
   0x08000412 <+8>:     movt    r5, #8192       ; 0x2000
   0x08000416 <+12>:    bl      0x800050c <__primask>
   0x0800041a <+16>:    mov     r4, r0
   0x0800041c <+18>:    bl      0x8000502 <__cpsid>
   0x08000420 <+22>:    lsls    r0, r4, #31
   0x08000422 <+24>:    ldrb    r6, [r5, #0]
   0x08000424 <+26>:    it      eq
   0x08000426 <+28>:    bleq    0x8000506 <__cpsie>
   0x0800042a <+32>:    cmp     r6, #0
   0x0800042c <+34>:    beq.n   0x8000416 <plugin_test::__cortex_m_rt_main+12>
   0x0800042e <+36>:    mov.w   r0, #1073741824 ; 0x40000000
   0x08000432 <+40>:    movs    r1, #1
   0x08000434 <+42>:    str     r1, [r0, #0]
   0x08000436 <+44>:    bl      0x800050a <__nop>
   0x0800043a <+48>:    b.n     0x8000436 <plugin_test::__cortex_m_rt_main+44>
End of assembler dump.

With this branch without plugin-lto:

Dump of assembler code for function main:
   0x08000400 <+0>:     push    {r7, lr}
   0x08000402 <+2>:     mov     r7, sp
   0x08000404 <+4>:     bl      0x800040a <plugin_test::__cortex_m_rt_main>
   0x08000408 <+8>:     udf     #254    ; 0xfe
End of assembler dump.

Dump of assembler code for function plugin_test::__cortex_m_rt_main:
   0x0800040a <+0>:     push    {r7, lr}
   0x0800040c <+2>:     mov     r7, sp
   0x0800040e <+4>:     movw    r5, #0
   0x08000412 <+8>:     movt    r5, #8192       ; 0x2000
   0x08000416 <+12>:    bl      0x800050e <asm::__primask>
   0x0800041a <+16>:    mov     r4, r0
   0x0800041c <+18>:    bl      0x8000502 <asm::__cpsid>
   0x08000420 <+22>:    lsls    r0, r4, #31
   0x08000422 <+24>:    ldrb    r6, [r5, #0]
   0x08000424 <+26>:    it      eq
   0x08000426 <+28>:    bleq    0x8000506 <asm::__cpsie>
   0x0800042a <+32>:    cmp     r6, #0
   0x0800042c <+34>:    beq.n   0x8000416 <plugin_test::__cortex_m_rt_main+12>
   0x0800042e <+36>:    mov.w   r0, #1073741824 ; 0x40000000
   0x08000432 <+40>:    movs    r1, #1
   0x08000434 <+42>:    str     r1, [r0, #0]
   0x08000436 <+44>:    bl      0x800050a <asm::__nop>
   0x0800043a <+48>:    b.n     0x8000436 <plugin_test::__cortex_m_rt_main+44>
End of assembler dump.

Pretty much the same thing, only the address of primask() that is a bit different.

Now with plugin-lto:

Dump of assembler code for function main:
   0x08000400 <+0>:     push    {r7, lr}
   0x08000402 <+2>:     mov     r7, sp
   0x08000404 <+4>:     bl      0x8000408 <plugin_test::__cortex_m_rt_main>
End of assembler dump.

Dump of assembler code for function plugin_test::__cortex_m_rt_main:
   0x08000408 <+0>:     push    {r7, lr}
   0x0800040a <+2>:     mov     r7, sp
   0x0800040c <+4>:     movw    r0, #0
   0x08000410 <+8>:     movt    r0, #8192       ; 0x2000
   0x08000414 <+12>:    b.n     0x8000418 <plugin_test::__cortex_m_rt_main+16>
   0x08000416 <+14>:    cbnz    r1, 0x8000428 <plugin_test::__cortex_m_rt_main+32>
   0x08000418 <+16>:    mrs     r2, PRIMASK
   0x0800041c <+20>:    cpsid   i
   0x0800041e <+22>:    ldrb    r1, [r0, #0]
   0x08000420 <+24>:    lsls    r2, r2, #31
   0x08000422 <+26>:    bne.n   0x8000416 <plugin_test::__cortex_m_rt_main+14>
   0x08000424 <+28>:    cpsie   i
   0x08000426 <+30>:    b.n     0x8000416 <plugin_test::__cortex_m_rt_main+14>
   0x08000428 <+32>:    mov.w   r0, #1073741824 ; 0x40000000
   0x0800042c <+36>:    movs    r1, #1
   0x0800042e <+38>:    str     r1, [r0, #0]
   0x08000430 <+40>:    nop
   0x08000432 <+42>:    b.n     0x8000430 <plugin_test::__cortex_m_rt_main+40>
End of assembler dump.

So everything seems to be working great, at least in this example. The plugin-lto managed to do the inlining and some more optimizations because of that, but it didn't optimize our read to the Mutex. I expect that we can rely on that, because the asm!() macro does clobbering.

@Disasm
Copy link
Member

Disasm commented Aug 27, 2020

I suspect that linking to -lto archives will stop working after then next LLVM version bump in Rust (and probably with old Rust versions too), but otherwise this approach looks great :)

@jonas-schievink
Copy link
Contributor Author

From what I gathered, ThinLTO summaries are at least forwards-compatible, so we should be able to keep using old archives with newer Rust versions. I haven't tested this though (not sure if there's any way to do that).

@Disasm
Copy link
Member

Disasm commented Aug 27, 2020

@jonas-schievink One problem I noticed while experimenting with LLVM binaries for ThinLTO is that sometimes even data layout string changes, making the binary incompatible with other LLVM versions. Probably it's a good idea to have multiple binaries, one for each LLVM version. We can even fallback to regular .o files when there is no -lto archive for the "current" LLVM version.

@jonas-schievink
Copy link
Contributor Author

I don't really want to set that up, if users really want their asm inlined they can use a Rust version that's compatible

@therealprof
Copy link
Contributor

Yeah, I'd rather mark LTO as an experimental feature for now and not block this very nice improvement.

@therealprof
Copy link
Contributor

Looks good to me, I'd have made the disclaimer a bit more explicit but hey.

Anything else have any opinion on this? Otherwise I'd be tempted to approve it.

Copy link
Member

@thalesfragoso thalesfragoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks again Jonas.

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 27, 2020

Build succeeded:

@bors bors bot merged commit 36886f1 into rust-embedded:master Aug 27, 2020
bors bot added a commit that referenced this pull request Aug 31, 2020
262: Merge asm implementations r=therealprof a=jonas-schievink

This replaces the implementation of `inline-asm` with the file I wrote in #259 (and some fixes).

All functions that call assembly now do so via a `call_asm!` macro that either dispatches to a call to an `#[inline(always)]` function containing the inline `asm!`, or to the FFI shim. This makes all functions that call into asm significantly shorter.

The FFI shim is now also macro-generated, which makes it very small.

Co-authored-by: Jonas Schievink <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-cortex-m
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check-blobs.sh does nothing Remove Arm server access from CI Support Linker Plugin to inline ASM
5 participants