forked from osandov/drgn
-
Notifications
You must be signed in to change notification settings - Fork 6
Merge branch 'master' into '6.0/stage' #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.
It's annoying to have to do value= when creating objects, especially in interactive mode. Let's allow passing in the value positionally so that `Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's clear enough that this is creating an int with value 0.
The model has always been that drgn Objects are immutable, but for some reason I went through the trouble of allowing __init__() to reinitialize an already initialized Object. Instead, let's fully initialize the Object in __new__() and get rid of __init__().
Despite the naming, this is the kernel stack size.
We currently unwind from pt_regs and NT_PRSTATUS using an array of register definitions. It's more flexible and more efficient to do this with an architecture-specific callback. For x86-64, this change also makes us depend on the binary layout rather than member names of struct pt_regs, but that shouldn't matter unless people are defining their own, weird struct pt_regs.
drgn has a couple of issues unwinding stack traces for kernel core dumps: 1. It can't unwind the stack for the idle task (PID 0), which commonly appears in core dumps. 2. It uses the PID in PRSTATUS, which is racy and can't actually be trusted. The solution for both of these is to look up the PRSTATUS note by CPU instead of PID. For the live kernel, drgn refuses to unwind the stack of tasks in the "R" state. However, the "R" state is running *or runnable*, so in the latter case, we can still unwind the stack. The solution for this is to look at on_cpu for the task instead of the state.
After thinking about it some more, I realized that "libdwfl: simplify activation frame logic" breaks the case where during unwinding someone queries isactivation for reasons other than knowing whether to decrement program counter. Revert the patch and refactor "libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it differently. Based on: c95081596 size: Also obey radix printing for bsd format. With the following patches: configure: Add --disable-programs configure: Add --disable-shared libdwfl: add interface for attaching to/detaching from threads libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame libdwfl: add interface for evaluating DWARF expressions in a frame
sdimitro
approved these changes
May 21, 2020
prakashsurya
approved these changes
May 21, 2020
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 27, 2025
The CI has intermittently been hitting the following test failures on Python 3.8 with Clang: ====================================================================== ERROR: test_task_cpu (tests.linux_kernel.helpers.test_sched.TestSched) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/drgn/drgn/tests/linux_kernel/helpers/test_sched.py", line 40, in test_task_cpu with fork_and_stop(os.sched_setaffinity, 0, (cpu,)) as (pid, _): File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/contextlib.py", line 113, in __enter__ return next(self.gen) File "/home/runner/work/drgn/drgn/tests/linux_kernel/__init__.py", line 203, in fork_and_stop ret = pickle.load(pipe_r) EOFError: Ran out of input The EOFError occurs because the forked process segfaults immediately: python[132]: segfault at 7f8f87085014 ip 00007f8f891e9774 sp 00007ffccf7acf00 error 4 in ld-linux-x86-64.so.2[16774,7f8f891d5000+2a000] likely on CPU 0 (core 0, socket 0) The segfault is on dereferencing cache_new in in _dl_load_cache_lookup() in ld-linux here: https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-cache.c;h=88bf78ad7c914b02109d6ddef7e08c0e8fd4574d;hb=f94f6d8a3572840d3ba42ab9ace3ea522c99c0c2#l489 Which is coming from a libomp fork handler: #0 0x00007f5566f9d774 in _dl_load_cache_lookup (name=name@entry=0x7f55654afde6 "libmemkind.so") at ./elf/dl-cache.c:498 #1 0x00007f5566f91982 in _dl_map_object (loader=loader@entry=0x55f8a170b670, name=name@entry=0x7f55654afde6 "libmemkind.so", type=type@entry=2, trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048191, nsid=<optimized out>) at ./elf/dl-load.c:2193 #2 0x00007f5566f959a9 in dl_open_worker_begin (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:534 #3 0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf585050, operate=operate@entry=0x7f5566f95900 <dl_open_worker_begin>, args=args@entry=0x7fffcf5851f0) at ./elf/dl-error-skeleton.c:208 #4 0x00007f5566f94f9a in dl_open_worker (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:782 #5 0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5851d0, operate=operate@entry=0x7f5566f94f60 <dl_open_worker>, args=args@entry=0x7fffcf5851f0) at ./elf/dl-error-skeleton.c:208 #6 0x00007f5566f9534e in _dl_open (file=<optimized out>, mode=-2147483647, caller_dlopen=0x7f55653fa882, nsid=-2, argc=9, argv=<optimized out>, env=0x55f8a1477e10) at ./elf/dl-open.c:883 #7 0x00007f5566a6663c in dlopen_doit (a=a@entry=0x7fffcf585460) at ./dlfcn/dlopen.c:56 #8 0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5853c0, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208 #9 0x00007f5566b4abd3 in __GI__dl_catch_error (objname=0x7fffcf585418, errstring=0x7fffcf585420, mallocedp=0x7fffcf585417, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227 #10 0x00007f5566a6612e in _dlerror_run (operate=operate@entry=0x7f5566a665e0 <dlopen_doit>, args=args@entry=0x7fffcf585460) at ./dlfcn/dlerror.c:138 #11 0x00007f5566a666c8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71 #12 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81 #13 0x00007f55653fa882 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5 #14 0x00007f5565413556 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5 #15 0x00007f5565421d1a in ?? () from /usr/lib/llvm-14/lib/libomp.so.5 #16 0x00007f5566ac0fc1 in __run_fork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=true) at ./posix/register-atfork.c:130 #17 0x00007f5566ac08d3 in __libc_fork () at ./posix/fork.c:108 #18 0x00007f5566e108ad in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:6250 #19 os_fork (module=<optimized out>, _unused_ignored=<optimized out>) at ./Modules/clinic/posixmodule.c.h:2750 This doesn't happen in Python 3.9, which I bisected to CPython commit 45a78f906d2d ("bpo-44434: Don't call PyThread_exit_thread() explicitly (GH-26758)") (in v3.11, backported to v3.9.6). That commit describes a different symptom where the process aborts because libgcc_s can't be loaded. I don't understand how that issue can cause our crash, but the fix appears to be the same. The discussion also suggests a workaround: linking to libgcc_s explicitly. Apply the workaround, which appears to fix our problem. We only do this for the CI and not for the general build for a few reasons: 1. I'm nervous about explicitly linking to this low-level library unconditionally, and the logic to decide when it's necessary (only for Python 3.8 and glibc) isn't worth the trouble. 2. The situation required to hit it (drgn + Python threading + fork) is unlikely outside of our test suite. 3. Python 3.8 is EOL. 4. Builds with libkdumpfile already pull in libgcc_s via libkdumpfile -> libsnappy -> libstdc++ -> libgcc_s. Signed-off-by: Omar Sandoval <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.