-
Notifications
You must be signed in to change notification settings - Fork 58.3k
Ignore diagonals for iMON PAD in keyboard mode #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For user with an iMON PAD Remote control, the keyboard mode is very touchy and almost useless with XBMC. Event with stabilized() algorithm the behaviour is unexpected. To make it less touchy, I make it ignore any value too close to the diagonals.
hubcapsc
pushed a commit
to hubcapsc/linux
that referenced
this pull request
Jul 2, 2014
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 [ 0.603005] .... node #1, CPUs: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 [ 1.200005] .... node #2, CPUs: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 [ 1.796005] .... node #3, CPUs: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 [ 2.393005] .... node #4, CPUs: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 [ 2.996005] .... node #5, CPUs: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 [ 3.600005] .... node torvalds#6, CPUs: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 [ 4.202005] .... node torvalds#7, CPUs: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 [ 4.811005] .... node torvalds#8, CPUs: torvalds#64 torvalds#65 torvalds#66 torvalds#67 torvalds#68 torvalds#69 #70 torvalds#71 [ 5.421006] .... node torvalds#9, CPUs: torvalds#72 torvalds#73 torvalds#74 torvalds#75 torvalds#76 torvalds#77 torvalds#78 torvalds#79 [ 6.032005] .... node torvalds#10, CPUs: torvalds#80 torvalds#81 torvalds#82 torvalds#83 torvalds#84 torvalds#85 torvalds#86 torvalds#87 [ 6.648006] .... node torvalds#11, CPUs: torvalds#88 torvalds#89 torvalds#90 torvalds#91 torvalds#92 torvalds#93 torvalds#94 torvalds#95 [ 7.262005] .... node torvalds#12, CPUs: torvalds#96 torvalds#97 torvalds#98 torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103 [ 7.865005] .... node torvalds#13, CPUs: torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111 [ 8.466005] .... node torvalds#14, CPUs: torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119 [ 9.073006] .... node torvalds#15, CPUs: torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
hubcapsc
pushed a commit
to hubcapsc/linux
that referenced
this pull request
Jul 2, 2014
These mappings are in fact special and require special handling in privcmd, which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. e.g. e.g.: BUG: Bad page map in process xl pte:e0004077b33f53 pmd:4079575003 page:ffffffbce1a2f328 count:1 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) addr:0000007fb5259000 vm_flags:040644fa anon_vma: (null) mapping:ffffffc03a6fda58 index:0 vma->vm_ops->fault: privcmd_fault+0x0/0x38 vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ torvalds#102 Call trace: [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c [<ffffffc000088238>] show_stack+0x14/0x1c [<ffffffc0004b67e0>] dump_stack+0x70/0x90 [<ffffffc000125690>] print_bad_pte+0x12c/0x1bc [<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700 [<ffffffc0001273b4>] unmap_vmas+0x68/0xb4 [<ffffffc00012c050>] unmap_region+0xcc/0x1d4 [<ffffffc00012df20>] do_munmap+0x218/0x314 [<ffffffc00012e060>] vm_munmap+0x44/0x64 [<ffffffc00012ed78>] SyS_munmap+0x24/0x34 Where unmap_single_vma contains inlined -> unmap_page_range -> zap_pud_range -> zap_pmd_range -> zap_pte_range -> print_bad_pte. Or: BUG: Bad page state in process xl pfn:4077b4d page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) Modules linked in: CPU: 0 PID: 2657 Comm: xl Tainted: G B 3.12.0+ torvalds#102 Call trace: [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c [<ffffffc000088238>] show_stack+0x14/0x1c [<ffffffc0004b67e0>] dump_stack+0x70/0x90 [<ffffffc00010f798>] bad_page+0xc4/0x110 [<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8 [<ffffffc000110e94>] free_hot_cold_page+0x28/0x178 [<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60 [<ffffffc000114cf0>] release_pages+0x190/0x1dc [<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4 [<ffffffc00012df20>] do_munmap+0x218/0x314 [<ffffffc00012e060>] vm_munmap+0x44/0x64 [<ffffffc00012ed78>] SyS_munmap+0x24/0x34 x86 already gets this correct. 32-bit arm gets away with this because there is not PTE_SPECIAL bit in the PTE there and the vm_normal_page fallback path does the right thing. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Stefano Stabellini <[email protected]>
pstglia
pushed a commit
to pstglia/linux
that referenced
this pull request
Oct 6, 2014
During the EEH hotplug event, iommu_add_device() will be invoked three times
and two of them will trigger warning or error.
The three times to invoke the iommu_add_device() are:
pci_device_add
...
set_iommu_table_base_and_group <- 1st time, fail
device_add
...
tce_iommu_bus_notifier <- 2nd time, succees
pcibios_add_pci_devices
...
pcibios_setup_bus_devices <- 3rd time, re-attach
The first time fails, since the dev->kobj->sd is not initialized. The
dev->kobj->sd is initialized in device_add().
The third time's warning is triggered by the re-attach of the iommu_group.
After applying this patch, the error
iommu_tce: 0003:05:00.0 has not been added, ret=-14
and the warning
[ 204.123609] ------------[ cut here ]------------
[ 204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
[ 204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
[ 204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ torvalds#102
[ 204.124400] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
[ 204.124453] NIP: c00000000003cf80 LR: c00000000006c648 CTR: c00000000006c5c0
[ 204.124506] REGS: c0000027ed50f440 TRAP: 0700 Not tainted (3.14.0-rc5yw+)
[ 204.124558] MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 88008084 XER: 20000000
[ 204.124682] CFAR: c00000000006c644 SOFTE: 1
GPR00: c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
GPR04: c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
GPR08: c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
GPR12: 0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR24: 000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
GPR28: 0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
[ 204.125353] NIP [c00000000003cf80] .iommu_add_device+0x30/0x1f0
[ 204.125399] LR [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[ 204.125443] Call Trace:
[ 204.125464] [c0000027ed50f6c0] [c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
[ 204.125526] [c0000027ed50f750] [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[ 204.125588] [c0000027ed50f7d0] [c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
[ 204.125650] [c0000027ed50f870] [c000000000044408] .pcibios_setup_device+0x88/0x2f0
[ 204.125712] [c0000027ed50f940] [c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
[ 204.125774] [c0000027ed50f9c0] [c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
[ 204.125837] [c0000027ed50fa50] [c00000000086f970] .eeh_reset_device+0x36c/0x4f0
[ 204.125939] [c0000027ed50fb20] [c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
[ 204.126068] [c0000027ed50fbc0] [c00000000003a35c] .eeh_handle_event+0x4c/0x340
[ 204.126192] [c0000027ed50fc80] [c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
[ 204.126319] [c0000027ed50fd30] [c0000000000d1ea0] .kthread+0x110/0x130
[ 204.126430] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
[ 204.126556] Instruction dump:
[ 204.126610] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
[ 204.126787] 60000000 e87e0298 3143ffff 7d2a1910 <0b090000> 2fa90000 40de00c8 ebfe0218
[ 204.126966] ---[ end trace 6e7aefd80add2973 ]---
are cleared.
This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
revert part of the change in commit d905c5d(PPC: POWERNV: move
iommu_add_device earlier).
Signed-off-by: Wei Yang <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
apxii
pushed a commit
to apxii/linux
that referenced
this pull request
May 6, 2015
I2C kernel module removal fix
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Oct 18, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Oct 21, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Oct 22, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
nhoriguchi
pushed a commit
to nhoriguchi/linux
that referenced
this pull request
Oct 30, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Nov 11, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Nov 12, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Nov 19, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Nov 26, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Dec 4, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Dec 7, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Dec 9, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
Dec 10, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Dec 10, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Dec 11, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
Dec 11, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Dec 18, 2015
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 1, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 6, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 13, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 14, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 15, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 21, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 22, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jan 28, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
0day-ci
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Feb 1, 2016
This patch fixes a typo in khugepaged_scan_pmd(): instead of setting "result" to SCAN_EXCEED_SWAP_PTE we set "ret". Setting "ret" results in an attempt to collapse a huge page although we meant aborting the scan. As a result, we can call khugepaged_find_target_node() with all entries in the khugepaged_node_load array being zeros. The latter is not ready for that and might return an offline node on such input. This leads to a warning followed by kernel panic: WARNING: CPU: 1 PID: 40 at include/linux/gfp.h:314 khugepaged_alloc_page+0xd4/0xf0() CPU: 1 PID: 40 Comm: khugepaged Not tainted 4.3.0-rc1-mm1+ torvalds#102 000000000000013a ffff88010ae77b58 ffffffff813270d4 ffffffff818cda31 0000000000000000 ffff88010ae77b98 ffffffff8107c9f5 dead000000000100 ffff88010ae77e70 0000000000c752da 0000000000000001 0000000000000000 Call Trace: [<ffffffff813270d4>] dump_stack+0x48/0x64 [<ffffffff8107c9f5>] warn_slowpath_common+0x95/0xe0 [<ffffffff8107ca5a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811ec124>] khugepaged_alloc_page+0xd4/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 BUG: unable to handle kernel paging request at 0000000000014028 IP: [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 PGD aaac7067 PUD aaac606 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 40 Comm: khugepaged Tainted: G W 4.3.0-rc1-mm1+ torvalds#102 task: ffff88010ae16400 ti: ffff88010ae74000 task.ti: ffff88010ae74000 RIP: 0010:[<ffffffff81185eb2>] [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP: 0018:ffff88010ae77ad8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000014020 RCX: 0000000000000014 RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000c752da RBP: ffff88010ae77ba8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000c752da FS: 0000000000000000(0000) GS:ffff88010be40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000014028 CR3: 00000000aaac4000 CR4: 00000000000006e0 Stack: ffff88010ae77ae8 ffffffff810d0b3b ffff88010ae77b48 ffffffff81179e73 0000000000000010 ffff88010ae77b58 ffff88010ae77b18 ffffffff811ec124 ffff88010ae77b38 00000009a6e3aff4 0000000000000000 0000000000000000 Call Trace: [<ffffffff810d0b3b>] ? vprintk_default+0x2b/0x40 [<ffffffff81179e73>] ? printk+0x46/0x48 [<ffffffff811ec124>] ? khugepaged_alloc_page+0xd4/0xf0 [<ffffffff8107ca04>] ? warn_slowpath_common+0xa4/0xe0 [<ffffffff811ec0cd>] khugepaged_alloc_page+0x7d/0xf0 [<ffffffff811f15c8>] collapse_huge_page+0x58/0x550 [<ffffffff810b38e6>] ? account_entity_dequeue+0xb6/0xd0 [<ffffffff810b5289>] ? idle_balance+0x79/0x2b0 [<ffffffff811f1f5e>] khugepaged_scan_pmd+0x49e/0x710 [<ffffffff810e1f3a>] ? lock_timer_base+0x5a/0x80 [<ffffffff810e1fbb>] ? try_to_del_timer_sync+0x5b/0x70 [<ffffffff810e214c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8168242f>] ? schedule_timeout+0x11f/0x200 [<ffffffff811f2330>] khugepaged_scan_mm_slot+0x160/0x2a0 [<ffffffff811f255f>] khugepaged_do_scan+0xef/0x160 [<ffffffff810bcdb0>] ? wait_woken+0x80/0x80 [<ffffffff811f25d0>] ? khugepaged_do_scan+0x160/0x160 [<ffffffff811f25f8>] khugepaged+0x28/0x80 [<ffffffff8109ab1c>] kthread+0xcc/0xf0 [<ffffffff810a667e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8168371f>] ret_from_fork+0x3f/0x70 [<ffffffff8109aa50>] ? kthread_freezable_should_stop+0x70/0x70 RIP [<ffffffff81185eb2>] __alloc_pages_nodemask+0xc2/0x2c0 RSP <ffff88010ae77ad8> CR2: 0000000000014028 Fixes: acc067d ("mm: make optimistic check for swapin readahead") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Ebru Akagunduz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 11, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 15, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 15, 2025
Introduce a bpf struct ops for implementing custom OOM handling policies.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise.
In the latter case it's guaranteed that the in-kernel OOM killer will
be invoked. Otherwise the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. It's a safety mechanism which
prevents a bpf program to claim forward progress without actually
releasing memory. The callback program is sleepable to enable using
iterators, e.g. cgroup iterators.
The callback receives struct oom_control as an argument, so it can
determine the scope of the OOM event: if this is a memcg-wide or
system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 15, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 16, 2025
Introduce a bpf struct ops for implementing custom OOM handling policies.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise.
In the latter case it's guaranteed that the in-kernel OOM killer will
be invoked. Otherwise the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. It's a safety mechanism which
prevents a bpf program to claim forward progress without actually
releasing memory. The callback program is sleepable to enable using
iterators, e.g. cgroup iterators.
The callback receives struct oom_control as an argument, so it can
determine the scope of the OOM event: if this is a memcg-wide or
system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 16, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 16, 2025
Introduce a bpf struct ops for implementing custom OOM handling policies.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise.
In the latter case it's guaranteed that the in-kernel OOM killer will
be invoked. Otherwise the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. It's a safety mechanism which
prevents a bpf program to claim forward progress without actually
releasing memory. The callback program is sleepable to enable using
iterators, e.g. cgroup iterators.
The callback receives struct oom_control as an argument, so it can
determine the scope of the OOM event: if this is a memcg-wide or
system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
Signed-off-by: Roman Gushchin <[email protected]>
merge with bpf oom
require bpf_handle_out_of_memory
merge with oom policy name
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 16, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 18, 2025
Introduce a bpf struct ops for implementing custom OOM handling policies.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise.
In the latter case it's guaranteed that the in-kernel OOM killer will
be invoked. Otherwise the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. It's a safety mechanism which
prevents a bpf program to claim forward progress without actually
releasing memory. The callback program is sleepable to enable using
iterators, e.g. cgroup iterators.
The callback receives struct oom_control as an argument, so it can
determine the scope of the OOM event: if this is a memcg-wide or
system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
Signed-off-by: Roman Gushchin <[email protected]>
merge with bpf oom
require bpf_handle_out_of_memory
merge with oom policy name
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 18, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 18, 2025
Introduce a bpf struct ops for implementing custom OOM handling policies.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise.
In the latter case it's guaranteed that the in-kernel OOM killer will
be invoked. Otherwise the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. It's a safety mechanism which
prevents a bpf program to claim forward progress without actually
releasing memory. The callback program is sleepable to enable using
iterators, e.g. cgroup iterators.
The callback receives struct oom_control as an argument, so it can
determine the scope of the OOM event: if this is a memcg-wide or
system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
Signed-off-by: Roman Gushchin <[email protected]>
merge with bpf oom
require bpf_handle_out_of_memory
merge with oom policy name
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 18, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 18, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 18, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 22, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 22, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 23, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 24, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 24, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
aotot
pushed a commit
to jove-decompiler/linux
that referenced
this pull request
Oct 26, 2025
After this patch: torvalds#102/1 flow_dissector_classification/ipv4:OK torvalds#102/2 flow_dissector_classification/ipv4_continue_dissect:OK torvalds#102/3 flow_dissector_classification/ipip:OK torvalds#102/4 flow_dissector_classification/gre:OK torvalds#102/5 flow_dissector_classification/port_range:OK torvalds#102/6 flow_dissector_classification/ipv6:OK torvalds#102 flow_dissector_classification:OK Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED Cc: Daniel Borkmann <[email protected]> Cc: Andrii Nakryiko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_policy=<policy> format. "default" is printed if bpf is not
used or policy name is not specified.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_policy=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Oct 27, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Nov 4, 2025
Introduce a bpf struct ops for implementing custom OOM handling
policies.
It's possible to load one bpf_oom_ops for the system and one
bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
cgroup tree is traversed from the OOM'ing memcg up to the root and
corresponding BPF OOM handlers are executed until some memory is
freed. If no memory is freed, the kernel OOM killer is invoked.
The struct ops provides the bpf_handle_out_of_memory() callback,
which expected to return 1 if it was able to free some memory and 0
otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
field of the oom_control structure, which is expected to be set by
kfuncs suitable for releasing memory. If both are set, OOM is
considered handled, otherwise the next OOM handler in the chain
(e.g. BPF OOM attached to the parent cgroup or the in-kernel OOM
killer) is executed.
The bpf_handle_out_of_memory() callback program is sleepable to enable
using iterators, e.g. cgroup iterators. The callback receives struct
oom_control as an argument, so it can determine the scope of the OOM
event: if this is a memcg-wide or system-wide OOM.
The callback is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
BPF OOM struct ops provides the handle_cgroup_offline() callback
which is good for releasing struct ops if the corresponding cgroup
is gone.
The struct ops also has the name field, which allows to define a
custom name for the implemented policy. It's printed in the OOM report
in the oom_handler=<name> format only if a bpf handler is invoked.
[ 112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
oom_handler=bpf_test_policy
[ 112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 112.698167] Call Trace:
[ 112.698177] <TASK>
[ 112.698182] dump_stack_lvl+0x4d/0x70
[ 112.698192] dump_header+0x59/0x1c6
[ 112.698199] oom_kill_process.cold+0x8/0xef
[ 112.698206] bpf_oom_kill_process+0x59/0xb0
[ 112.698216] bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
[ 112.698229] bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
[ 112.698236] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698240] bpf_handle_oom+0x11a/0x1e0
[ 112.698250] out_of_memory+0xab/0x5c0
[ 112.698258] mem_cgroup_out_of_memory+0xbc/0x110
[ 112.698274] try_charge_memcg+0x4b5/0x7e0
[ 112.698288] charge_memcg+0x2f/0xc0
[ 112.698293] __mem_cgroup_charge+0x30/0xc0
[ 112.698299] do_anonymous_page+0x40f/0xa50
[ 112.698311] __handle_mm_fault+0xbba/0x1140
[ 112.698317] ? srso_alias_return_thunk+0x5/0xfbef5
[ 112.698335] handle_mm_fault+0xe6/0x370
[ 112.698343] do_user_addr_fault+0x211/0x6a0
[ 112.698354] exc_page_fault+0x75/0x1d0
[ 112.698363] asm_exc_page_fault+0x26/0x30
[ 112.698366] RIP: 0033:0x7fa97236db00
Signed-off-by: Roman Gushchin <[email protected]>
rgushchin
added a commit
to rgushchin/linux
that referenced
this pull request
Nov 4, 2025
Currently there is a hard-coded list of possible oom constraints:
NONE, CPUSET, MEMORY_POLICY & MEMCG. Add a new one: CONSTRAINT_BPF.
Also, add an ability to specify a custom constraint name
when calling bpf_out_of_memory(). If an empty string is passed
as an argument, CONSTRAINT_BPF is displayed.
The resulting output in dmesg will look like this:
[ 315.224875] kworker/u17:0 invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
oom_policy=default
[ 315.226532] CPU: 1 UID: 0 PID: 74 Comm: kworker/u17:0 Not tainted 6.16.0-00015-gf09eb0d6badc torvalds#102 PREEMPT(full)
[ 315.226534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 315.226536] Workqueue: bpf_psi_wq bpf_psi_handle_event_fn
[ 315.226542] Call Trace:
[ 315.226545] <TASK>
[ 315.226548] dump_stack_lvl+0x4d/0x70
[ 315.226555] dump_header+0x59/0x1c6
[ 315.226561] oom_kill_process.cold+0x8/0xef
[ 315.226565] out_of_memory+0x111/0x5c0
[ 315.226577] bpf_out_of_memory+0x6f/0xd0
[ 315.226580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226589] bpf_prog_3018b0cf55d2c6bb_handle_psi_event+0x5d/0x76
[ 315.226594] bpf__bpf_psi_ops_handle_psi_event+0x47/0xa7
[ 315.226599] bpf_psi_handle_event_fn+0x63/0xb0
[ 315.226604] process_one_work+0x1fc/0x580
[ 315.226616] ? srso_alias_return_thunk+0x5/0xfbef5
[ 315.226624] worker_thread+0x1d9/0x3b0
[ 315.226629] ? __pfx_worker_thread+0x10/0x10
[ 315.226632] kthread+0x128/0x270
[ 315.226637] ? lock_release+0xd4/0x2d0
[ 315.226645] ? __pfx_kthread+0x10/0x10
[ 315.226649] ret_from_fork+0x81/0xd0
[ 315.226652] ? __pfx_kthread+0x10/0x10
[ 315.226655] ret_from_fork_asm+0x1a/0x30
[ 315.226667] </TASK>
[ 315.239745] memory: usage 42240kB, limit 9007199254740988kB, failcnt 0
[ 315.240231] swap: usage 0kB, limit 0kB, failcnt 0
[ 315.240585] Memory cgroup stats for /cgroup-test-work-dir673/oom_test/cg2:
[ 315.240603] anon 42897408
[ 315.241317] file 0
[ 315.241493] kernel 98304
...
[ 315.255946] Tasks state (memory values in pages):
[ 315.256292] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 315.257107] [ 675] 0 675 162013 10969 10712 257 0 155648 0 0 test_progs
[ 315.257927] oom-kill:constraint=CONSTRAINT_BPF_PSI_MEM,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/cgroup-test-work-dir673/oom_test/cg2,task_memcg=/cgroup-test-work-dir673/oom_test/cg2,task=test_progs,pid=675,uid=0
[ 315.259371] Memory cgroup out of memory: Killed process 675 (test_progs) total-vm:648052kB, anon-rss:42848kB, file-rss:1028kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Signed-off-by: Roman Gushchin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For user with an iMON PAD Remote control, the keyboard mode is very touchy and almost useless with XBMC. Event with stabilized() algorithm the behaviour is unexpected. To make it less touchy, I make it ignore any value too close to the diagonals.