riscv: irq: fix: check weather irq stack is in-use. #1

woshiluo · 2025-08-31T09:09:41Z

This commit adds a flag to check whether an IRQ stack is currently in use.

The EVL module's in-band hardirqs can cause context switches, leading to the same IRQ stack being used multiple times. This new flag prevents potential
issues by ensuring a stack isn't re-used until it's available.

Additionally, this patch fixes a pr_info format string error.

This commit adds a flag to check whether an IRQ stack is currently in use. The EVL module's in-band hardirqs can cause context switches, leading to the same IRQ stack being used multiple times. This new flag prevents potential issues by ensuring a stack isn't re-used until it's available. Additionally, this patch fixes a pr_info format string error. Signed-off-by: Woshiluo Luo <[email protected]>

Checking for PI/PP boosting mutex is not enough when dropping to in-band context: owning any mutex in this case would be wrong, since this would create a priority inversion. Extend the logic of evl_detect_boost_drop() to encompass any owned mutex, renaming it to evl_check_no_mutex() for consistency. As a side-effect, the thread which attempts to switch in-band while owning mutex(es) now receives a single HMDIAG_LKDEPEND notification, instead of notifying all waiter(s) sleeping on those mutexes. As a consequence, we can drop detect_inband_owner() which becomes redundant as it detects the same issue from the converse side without extending the test coverage (i.e. a contender would check whether the mutex owner is running in-band). This change does affect the behavior for applications turning on T_WOLI on waiter threads explicitly. This said, the same issue would still be detected if CONFIG_EVL_DEBUG_WOLI is set globally though, which is the recommended configuration during the development stage. This change also solves an ABBA issue which existed in the former implementation: [ 40.976962] ====================================================== [ 40.976964] WARNING: possible circular locking dependency detected [ 40.976965] 5.15.77-00716-g8390add2f766 torvalds#156 Not tainted [ 40.976968] ------------------------------------------------------ [ 40.976969] monitor-pp-lazy/363 is trying to acquire lock: [ 40.976971] ffff99c5c14e5588 (test363.0){....}-{0:0}, at: evl_detect_boost_drop+0x80/0x200 [ 40.976987] [ 40.976987] but task is already holding lock: [ 40.976988] ffff99c5c243d818 (monitor-pp-lazy:363){....}-{0:0}, at: evl_detect_boost_drop+0x0/0x200 [ 40.976996] [ 40.976996] which lock already depends on the new lock. [ 40.976996] [ 40.976997] [ 40.976997] the existing dependency chain (in reverse order) is: [ 40.976998] [ 40.976998] -> shannmu#1 (monitor-pp-lazy:363){....}-{0:0}: [ 40.977003] fast_grab_mutex+0xca/0x150 [ 40.977006] evl_lock_mutex_timeout+0x60/0xa90 [ 40.977009] monitor_oob_ioctl+0x226/0xed0 [ 40.977014] EVL_ioctl+0x41/0xa0 [ 40.977017] handle_pipelined_syscall+0x3d8/0x490 [ 40.977021] __pipeline_syscall+0xcc/0x2e0 [ 40.977026] pipeline_syscall+0x47/0x120 [ 40.977030] syscall_enter_from_user_mode+0x40/0xa0 [ 40.977036] do_syscall_64+0x15/0xf0 [ 40.977039] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 40.977044] [ 40.977044] -> #0 (test363.0){....}-{0:0}: [ 40.977048] __lock_acquire+0x133a/0x2530 [ 40.977053] lock_acquire+0xce/0x2d0 [ 40.977056] evl_detect_boost_drop+0xb0/0x200 [ 40.977059] evl_switch_inband+0x41e/0x540 [ 40.977064] do_oob_syscall+0x1bc/0x3d0 [ 40.977067] handle_pipelined_syscall+0xbe/0x490 [ 40.977071] __pipeline_syscall+0xcc/0x2e0 [ 40.977075] pipeline_syscall+0x47/0x120 [ 40.977079] syscall_enter_from_user_mode+0x40/0xa0 [ 40.977083] do_syscall_64+0x15/0xf0 [ 40.977086] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 40.977090] [ 40.977090] other info that might help us debug this: [ 40.977090] [ 40.977091] Possible unsafe locking scenario: [ 40.977091] [ 40.977092] CPU0 CPU1 [ 40.977093] ---- ---- [ 40.977094] lock(monitor-pp-lazy:363); [ 40.977096] lock(test363.0); [ 40.977098] lock(monitor-pp-lazy:363); [ 40.977100] lock(test363.0); [ 40.977102] [ 40.977102] *** DEADLOCK *** [ 40.977102] [ 40.977103] 1 lock held by monitor-pp-lazy/363: [ 40.977105] #0: ffff99c5c243d818 (monitor-pp-lazy:363){....}-{0:0}, at: evl_detect_boost_drop+0x0/0x200 [ 40.977113] Signed-off-by: Philippe Gerum <[email protected]>

This reverts commit 5ae263a6ca027c4b79c4dfacfcf4eca8209eefd0, because disambiguating the per-thread lock is not as useless as it seemed at first. Fixes this regression when CONFIG_DEBUG_HARD_SPINLOCKS is on: [ 52.090120] [ 52.090129] ============================================ [ 52.090134] WARNING: possible recursive locking detected [ 52.090139] 5.10.199-00830-g18654c202dd6 torvalds#118 Not tainted [ 52.090143] -------------------------------------------- [ 52.090147] monitor-dlk-A:4/493 is trying to acquire lock: [ 52.090152] c34a7010 (__RAWLOCK(&thread->lock)){-.-.}-{0:0}, at: evl_lock_mutex_timeout+0x4e0/0x870 [ 52.090169] [ 52.090173] but task is already holding lock: [ 52.090176] c34a5810 (__RAWLOCK(&thread->lock)){-.-.}-{0:0}, at: evl_lock_mutex_timeout+0x4e0/0x870 [ 52.090192] [ 52.090195] other info that might help us debug this: [ 52.090199] Possible unsafe locking scenario: [ 52.090202] [ 52.090205] CPU0 [ 52.090208] ---- [ 52.090211] lock(__RAWLOCK(&thread->lock)); [ 52.090221] lock(__RAWLOCK(&thread->lock)); [ 52.090229] [ 52.090233] *** DEADLOCK *** [ 52.090235] [ 52.090239] May be due to missing lock nesting notation [ 52.090242] [ 52.090246] 2 locks held by monitor-dlk-A:4/493: [ 52.090249] #0: c2d030d4 (&mon->mutex){....}-{0:0}, at: evl_lock_mutex_timeout+0x104/0x870 [ 52.090267] shannmu#1: c34a5810 (__RAWLOCK(&thread->lock)){-.-.}-{0:0}, at: evl_lock_mutex_timeout+0x4e0/0x870 Signed-off-by: Philippe Gerum <[email protected]>

evl_net_learn_ipv4_route() may be called from a softirq context. Make sure we don't enter fs reclaim, use atomic allocation instead. Fixes this lockdep splat: [ 393.641581] ================================ [ 393.641584] WARNING: inconsistent lock state [ 393.641588] 6.1.111-00840-g87ef751da8a7-dirty torvalds#30 Not tainted [ 393.641594] -------------------------------- [ 393.641597] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 393.641601] swapper/0/0 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 393.641611] c190530c (fs_reclaim){+.?.}-{0:0}, at: __kmem_cache_alloc_node+0x2c/0x204 [ 393.641647] {SOFTIRQ-ON-W} state was registered at: [ 393.641651] fs_reclaim_acquire+0x70/0xa8 [ 393.641664] __kmem_cache_alloc_node+0x2c/0x204 [ 393.641673] kmalloc_node_trace+0x24/0x4c [ 393.641682] init_rescuer+0x3c/0xe8 [ 393.641696] workqueue_init+0xa0/0x1e4 [ 393.641716] kernel_init_freeable+0x88/0x240 [ 393.641733] kernel_init+0x14/0x140 [ 393.641751] ret_from_fork+0x14/0x1c [ 393.641760] irq event stamp: 280800 [ 393.641764] hardirqs last enabled at (280800): [<c012ff8c>] handle_softirqs+0xa0/0x480 [ 393.641784] hardirqs last disabled at (280798): [<c01ab6f4>] sync_current_irq_stage+0x214/0x268 [ 393.641799] softirqs last enabled at (280780): [<c01301c0>] handle_softirqs+0x2d4/0x480 [ 393.641813] softirqs last disabled at (280799): [<c013050c>] __irq_exit_rcu+0x144/0x188 [ 393.641827] [ 393.641827] other info that might help us debug this: [ 393.641831] Possible unsafe locking scenario: [ 393.641831] [ 393.641833] CPU0 [ 393.641835] ---- [ 393.641837] lock(fs_reclaim); [ 393.641843] <Interrupt> [ 393.641845] lock(fs_reclaim); [ 393.641852] [ 393.641852] *** DEADLOCK *** [ 393.641852] [ 393.641853] 2 locks held by swapper/0/0: [ 393.641859] #0: c18e21c0 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0xc8/0x3d4 [ 393.641891] shannmu#1: c18e21c0 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x64/0x1c8 [ 393.641920] [ 393.641920] stack backtrace: [ 393.641924] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.111-00840-g87ef751da8a7-dirty torvalds#30 [ 393.641933] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 393.641937] IRQ stage: Linux [ 393.641944] unwind_backtrace from show_stack+0x10/0x14 [ 393.641967] show_stack from dump_stack_lvl+0x94/0xcc [ 393.641986] dump_stack_lvl from mark_lock.part.0+0x730/0x940 [ 393.642004] mark_lock.part.0 from __lock_acquire+0x978/0x2924 [ 393.642016] __lock_acquire from lock_acquire+0xf8/0x368 [ 393.642029] lock_acquire from fs_reclaim_acquire+0x70/0xa8 [ 393.642042] fs_reclaim_acquire from __kmem_cache_alloc_node+0x2c/0x204 [ 393.642058] __kmem_cache_alloc_node from kmalloc_trace+0x28/0x58 [ 393.642072] kmalloc_trace from evl_net_learn_ipv4_route+0x6c/0x12c [ 393.642095] evl_net_learn_ipv4_route from ip_route_output_flow+0x5c/0x64 [ 393.642113] ip_route_output_flow from ip_send_unicast_reply+0x144/0x50c [ 393.642132] ip_send_unicast_reply from tcp_v4_send_reset+0x25c/0x514 [ 393.642151] tcp_v4_send_reset from tcp_v4_rcv+0x98c/0xcdc [ 393.642164] tcp_v4_rcv from ip_protocol_deliver_rcu+0x3c/0x248 [ 393.642178] ip_protocol_deliver_rcu from ip_local_deliver_finish+0xd0/0x1c8 [ 393.642194] ip_local_deliver_finish from ip_sublist_rcv_finish+0x38/0xa0 [ 393.642210] ip_sublist_rcv_finish from ip_sublist_rcv+0x1e8/0x340 [ 393.642225] ip_sublist_rcv from ip_list_rcv+0xe4/0x2f8 [ 393.642240] ip_list_rcv from __netif_receive_skb_list_core+0x18c/0x1fc [ 393.642258] __netif_receive_skb_list_core from netif_receive_skb_list_internal+0x1f8/0x3d4 [ 393.642275] netif_receive_skb_list_internal from net_rx_action+0xe0/0x3cc [ 393.642291] net_rx_action from handle_softirqs+0xdc/0x480 [ 393.642312] handle_softirqs from __irq_exit_rcu+0x144/0x188 [ 393.642332] __irq_exit_rcu from irq_exit+0x8/0x28 [ 393.642352] irq_exit from arch_do_IRQ_pipelined+0x30/0x64 [ 393.642372] arch_do_IRQ_pipelined from sync_current_irq_stage+0x160/0x268 [ 393.642386] sync_current_irq_stage from __inband_irq_enable+0x48/0x54 [ 393.642400] __inband_irq_enable from cpuidle_enter_state+0x198/0x3e8 [ 393.642421] cpuidle_enter_state from cpuidle_enter+0x30/0x40 [ 393.642436] cpuidle_enter from do_idle+0x1e0/0x2ac [ 393.642460] do_idle from cpu_startup_entry+0x28/0x2c [ 393.642481] cpu_startup_entry from rest_init+0xd4/0x188 [ 393.642504] rest_init from arch_post_acpi_subsys_init+0x0/0x8 Signed-off-by: Philippe Gerum <[email protected]>

Signed-off-by: Woshiluo Luo <[email protected]>

fix: remote expert

aa0d9ff

Signed-off-by: Woshiluo Luo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

riscv: irq: fix: check weather irq stack is in-use. #1

riscv: irq: fix: check weather irq stack is in-use. #1

Uh oh!

woshiluo commented Aug 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

riscv: irq: fix: check weather irq stack is in-use. #1

Are you sure you want to change the base?

riscv: irq: fix: check weather irq stack is in-use. #1

Uh oh!

Conversation

woshiluo commented Aug 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant