Skip to content

Commit 7677f7f

Browse files
CmdrMoozytorvalds
authored andcommitted
userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ [[email protected]: fix minor fault page leak] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Reviewed-by: Peter Xu <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chinwen Chang <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michal Koutn" <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shawn Anastasio <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Steven Price <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Adam Ruprecht <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Cannon Matthews <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent eb14d4e commit 7677f7f

File tree

10 files changed

+150
-62
lines changed

10 files changed

+150
-62
lines changed

arch/arm64/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,7 @@ config ARM64
213213
select SWIOTLB
214214
select SYSCTL_EXCEPTION_TRACE
215215
select THREAD_INFO_IN_TASK
216+
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
216217
help
217218
ARM 64-bit (AArch64) Linux support.
218219

arch/x86/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,7 @@ config X86
165165
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
166166
select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
167167
select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD
168+
select HAVE_ARCH_USERFAULTFD_MINOR if X86_64 && USERFAULTFD
168169
select HAVE_ARCH_VMAP_STACK if X86_64
169170
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
170171
select HAVE_ARCH_WITHIN_STACK_FRAMES

fs/proc/task_mmu.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
661661
[ilog2(VM_PKEY_BIT4)] = "",
662662
#endif
663663
#endif /* CONFIG_ARCH_HAS_PKEYS */
664+
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
665+
[ilog2(VM_UFFD_MINOR)] = "ui",
666+
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
664667
};
665668
size_t i;
666669

fs/userfaultfd.c

Lines changed: 47 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -197,24 +197,21 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
197197
msg_init(&msg);
198198
msg.event = UFFD_EVENT_PAGEFAULT;
199199
msg.arg.pagefault.address = address;
200+
/*
201+
* These flags indicate why the userfault occurred:
202+
* - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault.
203+
* - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault.
204+
* - Neither of these flags being set indicates a MISSING fault.
205+
*
206+
* Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write
207+
* fault. Otherwise, it was a read fault.
208+
*/
200209
if (flags & FAULT_FLAG_WRITE)
201-
/*
202-
* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
203-
* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE
204-
* was not set in a UFFD_EVENT_PAGEFAULT, it means it
205-
* was a read fault, otherwise if set it means it's
206-
* a write fault.
207-
*/
208210
msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE;
209211
if (reason & VM_UFFD_WP)
210-
/*
211-
* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
212-
* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was
213-
* not set in a UFFD_EVENT_PAGEFAULT, it means it was
214-
* a missing fault, otherwise if set it means it's a
215-
* write protect fault.
216-
*/
217212
msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP;
213+
if (reason & VM_UFFD_MINOR)
214+
msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_MINOR;
218215
if (features & UFFD_FEATURE_THREAD_ID)
219216
msg.arg.pagefault.feat.ptid = task_pid_vnr(current);
220217
return msg;
@@ -401,8 +398,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
401398

402399
BUG_ON(ctx->mm != mm);
403400

404-
VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
405-
VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
401+
/* Any unrecognized flag is a bug. */
402+
VM_BUG_ON(reason & ~__VM_UFFD_FLAGS);
403+
/* 0 or > 1 flags set is a bug; we expect exactly 1. */
404+
VM_BUG_ON(!reason || (reason & (reason - 1)));
406405

407406
if (ctx->features & UFFD_FEATURE_SIGBUS)
408407
goto out;
@@ -612,7 +611,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
612611
for (vma = mm->mmap; vma; vma = vma->vm_next)
613612
if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) {
614613
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
615-
vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
614+
vma->vm_flags &= ~__VM_UFFD_FLAGS;
616615
}
617616
mmap_write_unlock(mm);
618617

@@ -644,7 +643,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
644643
octx = vma->vm_userfaultfd_ctx.ctx;
645644
if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) {
646645
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
647-
vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
646+
vma->vm_flags &= ~__VM_UFFD_FLAGS;
648647
return 0;
649648
}
650649

@@ -726,7 +725,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma,
726725
} else {
727726
/* Drop uffd context if remap feature not enabled */
728727
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
729-
vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
728+
vma->vm_flags &= ~__VM_UFFD_FLAGS;
730729
}
731730
}
732731

@@ -867,12 +866,12 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
867866
for (vma = mm->mmap; vma; vma = vma->vm_next) {
868867
cond_resched();
869868
BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^
870-
!!(vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
869+
!!(vma->vm_flags & __VM_UFFD_FLAGS));
871870
if (vma->vm_userfaultfd_ctx.ctx != ctx) {
872871
prev = vma;
873872
continue;
874873
}
875-
new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
874+
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
876875
prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
877876
new_flags, vma->anon_vma,
878877
vma->vm_file, vma->vm_pgoff,
@@ -1262,9 +1261,19 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
12621261
unsigned long vm_flags)
12631262
{
12641263
/* FIXME: add WP support to hugetlbfs and shmem */
1265-
return vma_is_anonymous(vma) ||
1266-
((is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) &&
1267-
!(vm_flags & VM_UFFD_WP));
1264+
if (vm_flags & VM_UFFD_WP) {
1265+
if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma))
1266+
return false;
1267+
}
1268+
1269+
if (vm_flags & VM_UFFD_MINOR) {
1270+
/* FIXME: Add minor fault interception for shmem. */
1271+
if (!is_vm_hugetlb_page(vma))
1272+
return false;
1273+
}
1274+
1275+
return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
1276+
vma_is_shmem(vma);
12681277
}
12691278

12701279
static int userfaultfd_register(struct userfaultfd_ctx *ctx,
@@ -1290,14 +1299,19 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
12901299
ret = -EINVAL;
12911300
if (!uffdio_register.mode)
12921301
goto out;
1293-
if (uffdio_register.mode & ~(UFFDIO_REGISTER_MODE_MISSING|
1294-
UFFDIO_REGISTER_MODE_WP))
1302+
if (uffdio_register.mode & ~UFFD_API_REGISTER_MODES)
12951303
goto out;
12961304
vm_flags = 0;
12971305
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
12981306
vm_flags |= VM_UFFD_MISSING;
12991307
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)
13001308
vm_flags |= VM_UFFD_WP;
1309+
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) {
1310+
#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
1311+
goto out;
1312+
#endif
1313+
vm_flags |= VM_UFFD_MINOR;
1314+
}
13011315

13021316
ret = validate_range(mm, &uffdio_register.range.start,
13031317
uffdio_register.range.len);
@@ -1341,7 +1355,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
13411355
cond_resched();
13421356

13431357
BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
1344-
!!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
1358+
!!(cur->vm_flags & __VM_UFFD_FLAGS));
13451359

13461360
/* check not compatible vmas */
13471361
ret = -EINVAL;
@@ -1421,8 +1435,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
14211435
start = vma->vm_start;
14221436
vma_end = min(end, vma->vm_end);
14231437

1424-
new_flags = (vma->vm_flags &
1425-
~(VM_UFFD_MISSING|VM_UFFD_WP)) | vm_flags;
1438+
new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
14261439
prev = vma_merge(mm, prev, start, vma_end, new_flags,
14271440
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
14281441
vma_policy(vma),
@@ -1544,7 +1557,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
15441557
cond_resched();
15451558

15461559
BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
1547-
!!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
1560+
!!(cur->vm_flags & __VM_UFFD_FLAGS));
15481561

15491562
/*
15501563
* Check not compatible vmas, not strictly required
@@ -1595,7 +1608,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
15951608
wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
15961609
}
15971610

1598-
new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
1611+
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
15991612
prev = vma_merge(mm, prev, start, vma_end, new_flags,
16001613
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
16011614
vma_policy(vma),
@@ -1863,6 +1876,9 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
18631876
goto err_out;
18641877
/* report all available features and ioctls to userland */
18651878
uffdio_api.features = UFFD_API_FEATURES;
1879+
#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
1880+
uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS;
1881+
#endif
18661882
uffdio_api.ioctls = UFFD_API_IOCTLS;
18671883
ret = -EFAULT;
18681884
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))

include/linux/mm.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,13 @@ extern unsigned int kobjsize(const void *objp);
372372
# define VM_GROWSUP VM_NONE
373373
#endif
374374

375+
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
376+
# define VM_UFFD_MINOR_BIT 37
377+
# define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */
378+
#else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
379+
# define VM_UFFD_MINOR VM_NONE
380+
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
381+
375382
/* Bits set in the VMA until the stack is in its final location */
376383
#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
377384

include/linux/userfaultfd_k.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
#include <linux/mm.h>
1818
#include <asm-generic/pgtable_uffd.h>
1919

20+
/* The set of all possible UFFD-related VM flags. */
21+
#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR)
22+
2023
/*
2124
* CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
2225
* new flags, since they might collide with O_* ones. We want
@@ -71,6 +74,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
7174
return vma->vm_flags & VM_UFFD_WP;
7275
}
7376

77+
static inline bool userfaultfd_minor(struct vm_area_struct *vma)
78+
{
79+
return vma->vm_flags & VM_UFFD_MINOR;
80+
}
81+
7482
static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
7583
pte_t pte)
7684
{
@@ -85,7 +93,7 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma,
8593

8694
static inline bool userfaultfd_armed(struct vm_area_struct *vma)
8795
{
88-
return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
96+
return vma->vm_flags & __VM_UFFD_FLAGS;
8997
}
9098

9199
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
@@ -132,6 +140,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
132140
return false;
133141
}
134142

143+
static inline bool userfaultfd_minor(struct vm_area_struct *vma)
144+
{
145+
return false;
146+
}
147+
135148
static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
136149
pte_t pte)
137150
{

include/trace/events/mmflags.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,12 @@ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" )
137137
#define IF_HAVE_VM_SOFTDIRTY(flag,name)
138138
#endif
139139

140+
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
141+
# define IF_HAVE_UFFD_MINOR(flag, name) {flag, name},
142+
#else
143+
# define IF_HAVE_UFFD_MINOR(flag, name)
144+
#endif
145+
140146
#define __def_vmaflag_names \
141147
{VM_READ, "read" }, \
142148
{VM_WRITE, "write" }, \
@@ -148,6 +154,7 @@ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" )
148154
{VM_MAYSHARE, "mayshare" }, \
149155
{VM_GROWSDOWN, "growsdown" }, \
150156
{VM_UFFD_MISSING, "uffd_missing" }, \
157+
IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR, "uffd_minor" ) \
151158
{VM_PFNMAP, "pfnmap" }, \
152159
{VM_DENYWRITE, "denywrite" }, \
153160
{VM_UFFD_WP, "uffd_wp" }, \

include/uapi/linux/userfaultfd.h

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,19 @@
1919
* means the userland is reading).
2020
*/
2121
#define UFFD_API ((__u64)0xAA)
22+
#define UFFD_API_REGISTER_MODES (UFFDIO_REGISTER_MODE_MISSING | \
23+
UFFDIO_REGISTER_MODE_WP | \
24+
UFFDIO_REGISTER_MODE_MINOR)
2225
#define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \
2326
UFFD_FEATURE_EVENT_FORK | \
2427
UFFD_FEATURE_EVENT_REMAP | \
25-
UFFD_FEATURE_EVENT_REMOVE | \
28+
UFFD_FEATURE_EVENT_REMOVE | \
2629
UFFD_FEATURE_EVENT_UNMAP | \
2730
UFFD_FEATURE_MISSING_HUGETLBFS | \
2831
UFFD_FEATURE_MISSING_SHMEM | \
2932
UFFD_FEATURE_SIGBUS | \
30-
UFFD_FEATURE_THREAD_ID)
33+
UFFD_FEATURE_THREAD_ID | \
34+
UFFD_FEATURE_MINOR_HUGETLBFS)
3135
#define UFFD_API_IOCTLS \
3236
((__u64)1 << _UFFDIO_REGISTER | \
3337
(__u64)1 << _UFFDIO_UNREGISTER | \
@@ -127,6 +131,7 @@ struct uffd_msg {
127131
/* flags for UFFD_EVENT_PAGEFAULT */
128132
#define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */
129133
#define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */
134+
#define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* If reason is VM_UFFD_MINOR */
130135

131136
struct uffdio_api {
132137
/* userland asks for an API number and the features to enable */
@@ -171,6 +176,10 @@ struct uffdio_api {
171176
*
172177
* UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will
173178
* be returned, if feature is not requested 0 will be returned.
179+
*
180+
* UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults
181+
* can be intercepted (via REGISTER_MODE_MINOR) for
182+
* hugetlbfs-backed pages.
174183
*/
175184
#define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
176185
#define UFFD_FEATURE_EVENT_FORK (1<<1)
@@ -181,6 +190,7 @@ struct uffdio_api {
181190
#define UFFD_FEATURE_EVENT_UNMAP (1<<6)
182191
#define UFFD_FEATURE_SIGBUS (1<<7)
183192
#define UFFD_FEATURE_THREAD_ID (1<<8)
193+
#define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9)
184194
__u64 features;
185195

186196
__u64 ioctls;
@@ -195,6 +205,7 @@ struct uffdio_register {
195205
struct uffdio_range range;
196206
#define UFFDIO_REGISTER_MODE_MISSING ((__u64)1<<0)
197207
#define UFFDIO_REGISTER_MODE_WP ((__u64)1<<1)
208+
#define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2)
198209
__u64 mode;
199210

200211
/*

init/Kconfig

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1644,6 +1644,11 @@ config HAVE_ARCH_USERFAULTFD_WP
16441644
help
16451645
Arch has userfaultfd write protection support
16461646

1647+
config HAVE_ARCH_USERFAULTFD_MINOR
1648+
bool
1649+
help
1650+
Arch has userfaultfd minor fault support
1651+
16471652
config MEMBARRIER
16481653
bool "Enable membarrier() system call" if EXPERT
16491654
default y

0 commit comments

Comments
 (0)