-
Notifications
You must be signed in to change notification settings - Fork 102
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] drm/amdgpu: fix task hang from failed job submission during process kill #1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] drm/amdgpu: fix task hang from failed job submission during process kill #1148
Conversation
mainline inclusion
from mainline-v6.17-rc2
category: bugfix
During process kill, drm_sched_entity_flush() will kill the vm
entities. The following job submissions of this process will fail, and
the resources of these jobs have not been released, nor have the fences
been signalled, causing tasks to hang and timeout.
Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to
stopped entity.
v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in
function amdgpu_cs_vm_handling().
Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.")
Signed-off-by: Liu01 Tong <[email protected]>
Signed-off-by: Lin.Cao <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1)
(cherry picked from commit aa5fc4362fac9351557eb27c745579159a2e4520)
Signed-off-by: Wentao Guan <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
deepin pr auto review这段代码的修改主要是增加了对虚拟机(VM)状态的检查,确保在处理命令提交(cs)时VM处于就绪状态。我来分析一下代码的改进:
if (!amdgpu_vm_ready(vm))
return -EINVAL;这是一个很好的防御性编程实践,确保在继续处理之前VM处于就绪状态。
改进建议:
bool ret = true;
总体而言,这个改进增加了对VM状态的更全面检查,提高了系统的健壮性。但需要注意锁的使用和初始化问题,以避免潜在的bug。 |
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds entity stop status checks to VM readiness and prevents job submission to stopped entities to avoid hangs and resource leaks during process kill. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[6.6.103]
mainline inclusion
from mainline-v6.17-rc2
category: bugfix
During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout.
Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity.
v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling().
Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.")
Reviewed-by: Christian König [email protected]
(cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1) (cherry picked from commit aa5fc4362fac9351557eb27c745579159a2e4520)
Summary by Sourcery
Bug Fixes: