Skip to content

Conversation

@cding-ddn
Copy link
Collaborator

Add PAGE_MKWRITE fuse request to allow FUSE daemon to acquire DLM lock for protecting dirty page creation.

Allow read_folio to return EAGAIN error and translate it to AOP_TRUNCATE_PAGE to retry page fault and read operations. This is used to prevent deadlock of folio lock/DLM lock order reversal:

  • Fault or read operations acquire folio lock first, then DLM lock.
  • FUSE daemon blocks new DLM lock acquisition while it invalidating page cache. invalidate_inode_pages2_range() acquires folio lock To prevent deadlock, the FUSE daemon will fail its DLM lock acquisition with EAGAIN if it detects an in-flight page cache invalidating operation.

This enables memory mapping across cluster nodes with proper distributed locking coordination.

@cding-ddn cding-ddn requested a review from bsbernd July 7, 2025 17:47
unsigned int no_mkwrite:1;

/* Use io_uring for communication */
unsigned int io_uring;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, needs to be switched to io_uring:1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated, but had slipped through so far)

@bsbernd
Copy link
Collaborator

bsbernd commented Jul 7, 2025

Disadvantage of this way is that we get a PAGE_MKWRITE for every page - that will be expensive.

@bsbernd
Copy link
Collaborator

bsbernd commented Jul 7, 2025

Needs a "Signed-off-by"

@yongzech yongzech self-requested a review July 8, 2025 05:35
@cding-ddn cding-ddn force-pushed the mkwrite-noble-6.8.0-58.60 branch from e77b0a5 to b6ebff1 Compare July 11, 2025 09:14
@cding-ddn cding-ddn force-pushed the mkwrite-noble-6.8.0-58.60 branch from b6ebff1 to 989869f Compare July 16, 2025 08:06
@cding-ddn cding-ddn changed the title fuse: add PAGE_MKWRITE opcode fuse: multi-node mmap support Jul 17, 2025
@cding-ddn cding-ddn force-pushed the mkwrite-noble-6.8.0-58.60 branch from 989869f to af8a424 Compare July 17, 2025 17:08
@bsbernd
Copy link
Collaborator

bsbernd commented Jul 18, 2025

@cding-ddn I can't merge, there are conflicts. I think the 1st patch in the series is already merged.

Renumber the operation code to a high value to avoid conflicts with upstream.
Send a DLM_WB_LOCK request in the page_mkwrite handler to enable FUSE
filesystems to acquire a distributed lock manager (DLM) lock for
protecting upcoming dirty pages when a previously read-only mapped
page is about to be written.

Signed-off-by: Cheng Ding <[email protected]>
Allow read_folio to return EAGAIN error and translate it to
AOP_TRUNCATE_PAGE to retry page fault and read operations.
This is used to prevent deadlock of folio lock/DLM lock order reversal:
 - Fault or read operations acquire folio lock first, then DLM lock.
 - FUSE daemon blocks new DLM lock acquisition while it invalidating
   page cache. invalidate_inode_pages2_range() acquires folio lock
To prevent deadlock, the FUSE daemon will fail its DLM lock acquisition
with EAGAIN if it detects an in-flight page cache invalidating
operation.

Signed-off-by: Cheng Ding <[email protected]>
@cding-ddn cding-ddn force-pushed the mkwrite-noble-6.8.0-58.60 branch from af8a424 to 8ecf118 Compare July 18, 2025 10:42
@cding-ddn
Copy link
Collaborator Author

@bernd, I did a rebase, it can be merged now

@bsbernd bsbernd merged commit 391f71c into DDNStorage:redfs-ubuntu-noble-6.8.0-58.60 Jul 18, 2025
@cding-ddn cding-ddn deleted the mkwrite-noble-6.8.0-58.60 branch September 23, 2025 18:13
bsbernd pushed a commit that referenced this pull request Nov 7, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-427.37.1.el9_4
commit-author Dawid Osuchowski <[email protected]>
commit d11a676

Ethtool callbacks can be executed while reset is in progress and try to
access deleted resources, e.g. getting coalesce settings can result in a
NULL pointer dereference seen below.

Reproduction steps:
Once the driver is fully initialized, trigger reset:
	# echo 1 > /sys/class/net/<interface>/device/reset
when reset is in progress try to get coalesce settings using ethtool:
	# ethtool -c <interface>

BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 11 PID: 19713 Comm: ethtool Tainted: G S                 6.10.0-rc7+ #7
RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
FS:  00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
Call Trace:
<TASK>
ice_get_coalesce+0x17/0x30 [ice]
coalesce_prepare_data+0x61/0x80
ethnl_default_doit+0xde/0x340
genl_family_rcv_msg_doit+0xf2/0x150
genl_rcv_msg+0x1b3/0x2c0
netlink_rcv_skb+0x5b/0x110
genl_rcv+0x28/0x40
netlink_unicast+0x19c/0x290
netlink_sendmsg+0x222/0x490
__sys_sendto+0x1df/0x1f0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x82/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7faee60d8e27

Calling netif_device_detach() before reset makes the net core not call
the driver when ethtool command is issued, the attempt to execute an
ethtool command during reset will result in the following message:

    netlink error: No such device

instead of NULL pointer dereference. Once reset is done and
ice_rebuild() is executing, the netif_device_attach() is called to allow
for ethtool operations to occur again in a safe manner.

Fixes: fcea6f3 ("ice: Add stats and ethtool support")
	Suggested-by: Jakub Kicinski <[email protected]>
	Reviewed-by: Igor Bagnucki <[email protected]>
	Signed-off-by: Dawid Osuchowski <[email protected]>
	Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
	Reviewed-by: Michal Schmidt <[email protected]>
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit d11a676)
	Signed-off-by: Jonathan Maple <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants