Skip to content

Conversation

@woshiluo
Copy link

This commit adds a flag to check whether an IRQ stack is currently in use.

The EVL module's in-band hardirqs can cause context switches, leading to the same IRQ stack being used multiple times. This new flag prevents potential
issues by ensuring a stack isn't re-used until it's available.

Additionally, this patch fixes a pr_info format string error.

This commit adds a flag to check whether an IRQ stack
is currently in use.

The EVL module's in-band hardirqs can cause context
switches, leading to the same IRQ stack being used
multiple times. This new flag prevents potential
issues by ensuring a stack isn't re-used until it's
available.

Additionally, this patch fixes a pr_info format string
error.

Signed-off-by: Woshiluo Luo <[email protected]>
Han-Chen-BC pushed a commit to Han-Chen-BC/visionfive2_linux that referenced this pull request Sep 26, 2025
Checking for PI/PP boosting mutex is not enough when dropping to
in-band context: owning any mutex in this case would be wrong, since
this would create a priority inversion.

Extend the logic of evl_detect_boost_drop() to encompass any owned
mutex, renaming it to evl_check_no_mutex() for consistency. As a
side-effect, the thread which attempts to switch in-band while owning
mutex(es) now receives a single HMDIAG_LKDEPEND notification, instead
of notifying all waiter(s) sleeping on those mutexes.

As a consequence, we can drop detect_inband_owner() which becomes
redundant as it detects the same issue from the converse side without
extending the test coverage (i.e. a contender would check whether the
mutex owner is running in-band).

This change does affect the behavior for applications turning on
T_WOLI on waiter threads explicitly. This said, the same issue would
still be detected if CONFIG_EVL_DEBUG_WOLI is set globally though,
which is the recommended configuration during the development stage.

This change also solves an ABBA issue which existed in the former
implementation:

[   40.976962] ======================================================
[   40.976964] WARNING: possible circular locking dependency detected
[   40.976965] 5.15.77-00716-g8390add2f766 torvalds#156 Not tainted
[   40.976968] ------------------------------------------------------
[   40.976969] monitor-pp-lazy/363 is trying to acquire lock:
[   40.976971] ffff99c5c14e5588 (test363.0){....}-{0:0}, at: evl_detect_boost_drop+0x80/0x200
[   40.976987]
[   40.976987] but task is already holding lock:
[   40.976988] ffff99c5c243d818 (monitor-pp-lazy:363){....}-{0:0}, at: evl_detect_boost_drop+0x0/0x200
[   40.976996]
[   40.976996] which lock already depends on the new lock.
[   40.976996]
[   40.976997]
[   40.976997] the existing dependency chain (in reverse order) is:
[   40.976998]
[   40.976998] -> shannmu#1 (monitor-pp-lazy:363){....}-{0:0}:
[   40.977003]        fast_grab_mutex+0xca/0x150
[   40.977006]        evl_lock_mutex_timeout+0x60/0xa90
[   40.977009]        monitor_oob_ioctl+0x226/0xed0
[   40.977014]        EVL_ioctl+0x41/0xa0
[   40.977017]        handle_pipelined_syscall+0x3d8/0x490
[   40.977021]        __pipeline_syscall+0xcc/0x2e0
[   40.977026]        pipeline_syscall+0x47/0x120
[   40.977030]        syscall_enter_from_user_mode+0x40/0xa0
[   40.977036]        do_syscall_64+0x15/0xf0
[   40.977039]        entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   40.977044]
[   40.977044] -> #0 (test363.0){....}-{0:0}:
[   40.977048]        __lock_acquire+0x133a/0x2530
[   40.977053]        lock_acquire+0xce/0x2d0
[   40.977056]        evl_detect_boost_drop+0xb0/0x200
[   40.977059]        evl_switch_inband+0x41e/0x540
[   40.977064]        do_oob_syscall+0x1bc/0x3d0
[   40.977067]        handle_pipelined_syscall+0xbe/0x490
[   40.977071]        __pipeline_syscall+0xcc/0x2e0
[   40.977075]        pipeline_syscall+0x47/0x120
[   40.977079]        syscall_enter_from_user_mode+0x40/0xa0
[   40.977083]        do_syscall_64+0x15/0xf0
[   40.977086]        entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   40.977090]
[   40.977090] other info that might help us debug this:
[   40.977090]
[   40.977091]  Possible unsafe locking scenario:
[   40.977091]
[   40.977092]        CPU0                    CPU1
[   40.977093]        ----                    ----
[   40.977094]   lock(monitor-pp-lazy:363);
[   40.977096]                                lock(test363.0);
[   40.977098]                                lock(monitor-pp-lazy:363);
[   40.977100]   lock(test363.0);
[   40.977102]
[   40.977102]  *** DEADLOCK ***
[   40.977102]
[   40.977103] 1 lock held by monitor-pp-lazy/363:
[   40.977105]  #0: ffff99c5c243d818 (monitor-pp-lazy:363){....}-{0:0}, at: evl_detect_boost_drop+0x0/0x200
[   40.977113]

Signed-off-by: Philippe Gerum <[email protected]>
Han-Chen-BC pushed a commit to Han-Chen-BC/visionfive2_linux that referenced this pull request Sep 26, 2025
This reverts commit 5ae263a6ca027c4b79c4dfacfcf4eca8209eefd0, because
disambiguating the per-thread lock is not as useless as it seemed at
first.

Fixes this regression when CONFIG_DEBUG_HARD_SPINLOCKS is on:

[   52.090120]
[   52.090129] ============================================
[   52.090134] WARNING: possible recursive locking detected
[   52.090139] 5.10.199-00830-g18654c202dd6 torvalds#118 Not tainted
[   52.090143] --------------------------------------------
[   52.090147] monitor-dlk-A:4/493 is trying to acquire lock:
[   52.090152] c34a7010 (__RAWLOCK(&thread->lock)){-.-.}-{0:0}, at: evl_lock_mutex_timeout+0x4e0/0x870
[   52.090169]
[   52.090173] but task is already holding lock:
[   52.090176] c34a5810 (__RAWLOCK(&thread->lock)){-.-.}-{0:0}, at: evl_lock_mutex_timeout+0x4e0/0x870
[   52.090192]
[   52.090195] other info that might help us debug this:
[   52.090199]  Possible unsafe locking scenario:
[   52.090202]
[   52.090205]        CPU0
[   52.090208]        ----
[   52.090211]   lock(__RAWLOCK(&thread->lock));
[   52.090221]   lock(__RAWLOCK(&thread->lock));
[   52.090229]
[   52.090233]  *** DEADLOCK ***
[   52.090235]
[   52.090239]  May be due to missing lock nesting notation
[   52.090242]
[   52.090246] 2 locks held by monitor-dlk-A:4/493:
[   52.090249]  #0: c2d030d4 (&mon->mutex){....}-{0:0}, at: evl_lock_mutex_timeout+0x104/0x870
[   52.090267]  shannmu#1: c34a5810 (__RAWLOCK(&thread->lock)){-.-.}-{0:0}, at: evl_lock_mutex_timeout+0x4e0/0x870

Signed-off-by: Philippe Gerum <[email protected]>
Han-Chen-BC pushed a commit to Han-Chen-BC/visionfive2_linux that referenced this pull request Sep 26, 2025
evl_net_learn_ipv4_route() may be called from a softirq context. Make
sure we don't enter fs reclaim, use atomic allocation instead.

Fixes this lockdep splat:

[  393.641581] ================================
[  393.641584] WARNING: inconsistent lock state
[  393.641588] 6.1.111-00840-g87ef751da8a7-dirty torvalds#30 Not tainted
[  393.641594] --------------------------------
[  393.641597] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  393.641601] swapper/0/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
[  393.641611] c190530c (fs_reclaim){+.?.}-{0:0}, at: __kmem_cache_alloc_node+0x2c/0x204
[  393.641647] {SOFTIRQ-ON-W} state was registered at:
[  393.641651]   fs_reclaim_acquire+0x70/0xa8
[  393.641664]   __kmem_cache_alloc_node+0x2c/0x204
[  393.641673]   kmalloc_node_trace+0x24/0x4c
[  393.641682]   init_rescuer+0x3c/0xe8
[  393.641696]   workqueue_init+0xa0/0x1e4
[  393.641716]   kernel_init_freeable+0x88/0x240
[  393.641733]   kernel_init+0x14/0x140
[  393.641751]   ret_from_fork+0x14/0x1c
[  393.641760] irq event stamp: 280800
[  393.641764] hardirqs last  enabled at (280800): [<c012ff8c>] handle_softirqs+0xa0/0x480
[  393.641784] hardirqs last disabled at (280798): [<c01ab6f4>] sync_current_irq_stage+0x214/0x268
[  393.641799] softirqs last  enabled at (280780): [<c01301c0>] handle_softirqs+0x2d4/0x480
[  393.641813] softirqs last disabled at (280799): [<c013050c>] __irq_exit_rcu+0x144/0x188
[  393.641827]
[  393.641827] other info that might help us debug this:
[  393.641831]  Possible unsafe locking scenario:
[  393.641831]
[  393.641833]        CPU0
[  393.641835]        ----
[  393.641837]   lock(fs_reclaim);
[  393.641843]   <Interrupt>
[  393.641845]     lock(fs_reclaim);
[  393.641852]
[  393.641852]  *** DEADLOCK ***
[  393.641852]
[  393.641853] 2 locks held by swapper/0/0:
[  393.641859]  #0: c18e21c0 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0xc8/0x3d4
[  393.641891]  shannmu#1: c18e21c0 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x64/0x1c8
[  393.641920]
[  393.641920] stack backtrace:
[  393.641924] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.111-00840-g87ef751da8a7-dirty torvalds#30
[  393.641933] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  393.641937] IRQ stage: Linux
[  393.641944]  unwind_backtrace from show_stack+0x10/0x14
[  393.641967]  show_stack from dump_stack_lvl+0x94/0xcc
[  393.641986]  dump_stack_lvl from mark_lock.part.0+0x730/0x940
[  393.642004]  mark_lock.part.0 from __lock_acquire+0x978/0x2924
[  393.642016]  __lock_acquire from lock_acquire+0xf8/0x368
[  393.642029]  lock_acquire from fs_reclaim_acquire+0x70/0xa8
[  393.642042]  fs_reclaim_acquire from __kmem_cache_alloc_node+0x2c/0x204
[  393.642058]  __kmem_cache_alloc_node from kmalloc_trace+0x28/0x58
[  393.642072]  kmalloc_trace from evl_net_learn_ipv4_route+0x6c/0x12c
[  393.642095]  evl_net_learn_ipv4_route from ip_route_output_flow+0x5c/0x64
[  393.642113]  ip_route_output_flow from ip_send_unicast_reply+0x144/0x50c
[  393.642132]  ip_send_unicast_reply from tcp_v4_send_reset+0x25c/0x514
[  393.642151]  tcp_v4_send_reset from tcp_v4_rcv+0x98c/0xcdc
[  393.642164]  tcp_v4_rcv from ip_protocol_deliver_rcu+0x3c/0x248
[  393.642178]  ip_protocol_deliver_rcu from ip_local_deliver_finish+0xd0/0x1c8
[  393.642194]  ip_local_deliver_finish from ip_sublist_rcv_finish+0x38/0xa0
[  393.642210]  ip_sublist_rcv_finish from ip_sublist_rcv+0x1e8/0x340
[  393.642225]  ip_sublist_rcv from ip_list_rcv+0xe4/0x2f8
[  393.642240]  ip_list_rcv from __netif_receive_skb_list_core+0x18c/0x1fc
[  393.642258]  __netif_receive_skb_list_core from netif_receive_skb_list_internal+0x1f8/0x3d4
[  393.642275]  netif_receive_skb_list_internal from net_rx_action+0xe0/0x3cc
[  393.642291]  net_rx_action from handle_softirqs+0xdc/0x480
[  393.642312]  handle_softirqs from __irq_exit_rcu+0x144/0x188
[  393.642332]  __irq_exit_rcu from irq_exit+0x8/0x28
[  393.642352]  irq_exit from arch_do_IRQ_pipelined+0x30/0x64
[  393.642372]  arch_do_IRQ_pipelined from sync_current_irq_stage+0x160/0x268
[  393.642386]  sync_current_irq_stage from __inband_irq_enable+0x48/0x54
[  393.642400]  __inband_irq_enable from cpuidle_enter_state+0x198/0x3e8
[  393.642421]  cpuidle_enter_state from cpuidle_enter+0x30/0x40
[  393.642436]  cpuidle_enter from do_idle+0x1e0/0x2ac
[  393.642460]  do_idle from cpu_startup_entry+0x28/0x2c
[  393.642481]  cpu_startup_entry from rest_init+0xd4/0x188
[  393.642504]  rest_init from arch_post_acpi_subsys_init+0x0/0x8

Signed-off-by: Philippe Gerum <[email protected]>
Signed-off-by: Woshiluo Luo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant