-
Notifications
You must be signed in to change notification settings - Fork 8.1k
arch/arm: New arch_switch() based context layer for Cortex M #85248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is finally looking good enough to submit, let's see how it runs in CI. First, it's important to note that @ithinuel has an entirely different arch_switch() implementation in #85080 that everyone should review too. That one is a relatively straight-line evolution of the current PendSV implementation. This one is (as I'm sure surprises no one) more of a rewrite, using a "normal" context switch. Really I don't see any reason why both shouldn't be able to merge: this will likely take some time to stabilize and we'd want to be maintaining the old stuff in parallel anyway. The big advantages to this one over that one:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still going through the changes but overall I think using cmsis apis would improve readability and address concerns like having isb at required places.
There seems to be a older comments that are not addressed yet from ithinuel and others, could you please have a look and reply to those as well?
For the issue you see with Corstone310/315/320 FVP boards, I don't think this should block the current PR so lets go with this change for now and raise a github issue assigned to me. I will start looking at it as soon as I am done with review of this PR.
Some notes on the ISB pedantry. :) Basically "exactly who demands this and in what specification?" And the CMSIS stuff is of a piece. I have a comment above from the spring about the same subject, but he core reason that I don't like HAL code is that it's filled with needless conservatism like this and semi-voodoo proscriptions that don't align with the documented behavior of the hardware. And especially that in general the HAL documentation is much worse than the ISA documentation is. I can read a formal behavior of BASEPRI and MSR and CONTROL in the architecture spec that tells me exactly what the instruction is going to do, right down to the level of interpretable pseudo code, and tell me how many pipeline stages the following instruction will be delayed. At best things like __set_CONTROL give you a general sense of what's going to happen, but often leave out critically important details (like in this case, "Oh, yeah, it does a pipeline stall too). Basically I've been burned repeatedly by HAL layers. CMSIS is surely better than Xtensa, but I still don't trust it. |
But that said, I'm fine if someone wants to come by later with a cleanup pass pointing out that this or that assembly sequence can be done with fewer instructions or whatever with CMSIS. I'm not a zealot. I'm just saying that given the choice between following instructions left by a software developer or a hardware engineer, I'm going to pick the hardware team every single time. |
I can follow your line of thoughts, Andy! I've been taught in my ARM training "change CONTROL, use ISB". But the more I look to it, the more I see it as "a cooking recipe" to be on the safe-side for the case one indeed needs an ISB. And if one doesn't, one just 'wastes some cycles' and this is often seen as the least evil for most applications. Of course, here it matters! I checked Joseph Yiu's definitive guide for the Cortex M23/33, which has a dedicated chapter on "OS support features". For the context switch code, he mentions "executes an ISB after CONTROL updates (architecture recommendation)", but without any further explanation. It would be interesting to see what he wrote on the previous book dealing with the M3/M4. While the Armv7-M Architecture Reference is quite detailed about ISB uses, the Armv8-M remains rather elusive on the topic. Or at least, that's how feel, perhaps I'm not looking at the right place. There are interesting aspects to be presented and discussed about this PR. I'm really looking forward to your talk at the ZDS@Amstedam! |
Tangential issue relating to consequences, not the implementation. |
They will during an exception, the pickling doesn't happen until context switch. If you want to inspect an OS-suspended thread, then yes, an adaptation layer will be needed to do the conversion. The routines are there, something just needs to call them in the right spots. Edit: actually does that even work right now? Every hardware debugger interface I've used only treats with CPU state, where there are multiple "threads" in gdb in these setups they reflect SMP contexts and not OS threads. Does J-Link have the intelligence to make this work? |
@hakehuang is it possible to test this PR on NXP boards with a full test cycle run. |
I tested on the MCXN947. It would be awesome to have more NXP boards! |
For anyone who wants to test this PR on a test bench:
(I didn't find a trivial way to filter for just Cortex-M platforms.) |
According to regression test on all
|
Just me thinking aloud... BBC Micro Bit board uses nRF51822 SoC with ARM Cortex-M0 at 16 MHz 256 KB Flash and 16 KB RAM. This board (though deprecated by the Vendor) is always a favorite/challenge to have functional simple Bluetooth features (say, peripheral role heart rate service with encrypted connections). Current upstream main shows Radio ISR latencies of ~ 90 us (observed in #95191) of which ARM Cortex-M0 wakeup with hardware program frame stacking should ideally be 10 us (SoC CPU clock and flash access settling time included). And say, can we get down to the Zephyr ISR vector overheads to be another 10 us (i.e. 16 MHz Cortex-M0) ? Bluetooth implementations have a hard deadline to process Radio ISRs on on-air packet reception (last bit on-air received to first bit on-air to transmit) as low as 230 us (1M PHY). Having a 90 us Radio ISR latency is a bottleneck today (may be this is a Controller design induced latencies due to other software interrupt at same priority introducing Radio ISR latencies too). Gradually changes/refactoring to zephyr's primitive implementations have steadily increased latencies (may be gone up and then after some optimizations come down). Hope, this BBC Micro Bit board can leave longer in the Zephyr Project (truely being an OS for the resource constraint)! Related: #74345 |
Is there a test/benchmark in Zephyr that best highlights the performance differences using this PR? I'd like to test EDIT: I ran a quick check using the new benchmark added by this PR: FRDM_MCXN947 (CM33) RT1170 CM7 RT1170 CM4 I can't find a better test/benchmark to highlight the performance differences right now. |
|
What's the status with this? There seemed to be really good momentum with getting this mergable, but now it's been pretty quiet for three weeks. |
@jhedberg : Most of my review comments have been resolved. What is left is the PSPLIM bug in tests/arch/arm/arm_switch/src/main.c, 2 minor issues (erroneous comment/ use of new arch_switch in NS domain), and resolve if the ISB is needed in one specific case. |
Oops, didn't realize it needs a rebase. Will do that tonight. I think this is probably safe to merge, given the old code remains functional and there are no API changes. And I'll revisit the review comments; I'd convinced myself they were all done, please point me at what's needed. |
In general this is what I followed:
From section B1.4.4 of the Armv7-M arch ref manual (https://developer.arm.com/documentation/ddi0403/ee/?lang=en): Also in section B1.4.4 of the Armv6-M arch ref manual (https://developer.arm.com/documentation/ddi0419/e/?lang=en)
CMSIS is widely adopted across Zephyr and its HALs, which shows it's both mature and stable. If there were real issues in its APIs, they'd likely show up broadly across the system, not just in switch code. |
I still don't think IAR is working, but I haven't nailed down exactly why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IAR assembler needs this.
*/ | ||
__attribute__((naked)) void arm_m_iciit_stub(void) | ||
{ | ||
__asm__("udf 0;"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__asm__("udf 0;"); | |
__asm__("udf #0;"); |
For info, I face some CPU faults on a few test cases, on STM32 boards embedding a Cortex-M33.
|
Targetting Zephyr release 4.4 as per Release WG discussion |
Mostly complete. Supports MPU, userspace, PSPLIM-based stack guards, and FPU/DSP features. ARMv8-M secure mode "should" work but I don't know how to test it.
Designed with an eye to uncompromising/best-in-industry cooperative context switch performance. No PendSV exception nor hardware stacking/unstacking, just a traditional "musical chairs" switch. Context gets saved on process stacks only instead of split between there and the thread struct. No branches in the core integer switch code (and just one in the FPU bits that can't be avoided).
Minimal assembly use; arch_switch() itself is ALWAYS_INLINE, there is an assembly stub for exception exit, and that's it beyond one/two instruction inlines elsewhere.
Selectable at build time, interoperable with existing code. Just use the pre-existing CONFIG_USE_SWITCH=y flag to enable it. Or turn it off to evade regressions as this stabilizes.
Exception/interrupt returns in the common case need only a single C function to be called at the tail, and then return naturally. Effectively "all interrupts are direct now". This isn't a benefit currently because the existing stubs haven't been removed (see # 4), but in the long term we can look at exploiting this. The boilerplate previously required is now (mostly) empty.
No support for ARMv6 (Cortex M0 et. al.) thumb code. The expanded instruction encodings in ARMv7 are a big (big) win, so the older cores really need a separate port to avoid impacting newer hardware. Thankfully there isn't that much code to port (see # 3), so this should be doable.