Skip to content

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Sep 5, 2025

[6.6.103]

mainline inclusion
from mainline-v6.17-rc2
category: bugfix

During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout.

Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity.

v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling().

Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.")

Reviewed-by: Christian König [email protected]

(cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1) (cherry picked from commit aa5fc4362fac9351557eb27c745579159a2e4520)

Summary by Sourcery

Bug Fixes:

  • Prevent hung tasks after process kill by avoiding job submissions to stopped VM entities in amdgpu

mainline inclusion
from mainline-v6.17-rc2
category: bugfix

During process kill, drm_sched_entity_flush() will kill the vm
entities. The following job submissions of this process will fail, and
the resources of these jobs have not been released, nor have the fences
been signalled, causing tasks to hang and timeout.

Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to
stopped entity.

v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in
function amdgpu_cs_vm_handling().

Fixes: 1f02f2044bda ("drm/amdgpu: Avoid extra evict-restore process.")
Signed-off-by: Liu01 Tong <[email protected]>
Signed-off-by: Lin.Cao <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit f101c13a8720c73e67f8f9d511fbbeda95bcedb1)
(cherry picked from commit aa5fc4362fac9351557eb27c745579159a2e4520)
Signed-off-by: Wentao Guan <[email protected]>
@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepin-ci-robot
Copy link

deepin pr auto review

这段代码的修改主要是增加了对虚拟机(VM)状态的检查,确保在处理命令提交(cs)时VM处于就绪状态。我来分析一下代码的改进:

  1. 在amdgpu_cs_vm_handling函数中添加了对amdgpu_vm_ready的检查:
if (!amdgpu_vm_ready(vm))
    return -EINVAL;

这是一个很好的防御性编程实践,确保在继续处理之前VM处于就绪状态。

  1. 在amdgpu_vm_ready函数中:
  • 移除了不必要的empty变量
  • 增加了对immediate和delayed状态的检查
  • 改进了返回值的计算逻辑

改进建议:

  1. 代码安全性:
  • 在amdgpu_vm_ready函数中,多个spin_lock/unlock操作可能导致潜在的死锁风险。建议考虑使用spin_lock_irqsave/spin_unlock_irqsave来替代,以避免在持有锁时被中断。
  1. 代码性能:
  • 当前实现需要获取多个锁,这可能会影响性能。可以考虑将相关状态变量放在同一个数据结构中,使用单个锁来保护,减少锁的获取/释放次数。
  1. 代码逻辑:
  • 在amdgpu_vm_ready函数中,ret变量在第一次使用前没有初始化,这可能导致未定义行为。应该在使用前初始化为true:
bool ret = true;
  1. 代码可读性:
  • 可以添加注释来解释immediate和delayed状态的含义,以及为什么需要检查这些状态。
  1. 错误处理:
  • amdgpu_cs_vm_handling函数中对amdgpu_vm_ready的检查应该在获取vm之后立即进行,而不是在其他操作之后。这样可以尽早失败,避免不必要的资源消耗。
  1. 同步机制:
  • 考虑使用更高级别的同步机制,如RCU(Read-Copy-Update),如果这些状态主要是读操作的话,可以提高并发性能。

总体而言,这个改进增加了对VM状态的更全面检查,提高了系统的健壮性。但需要注意锁的使用和初始化问题,以避免潜在的bug。

@sourcery-ai
Copy link

sourcery-ai bot commented Sep 5, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds entity stop status checks to VM readiness and prevents job submission to stopped entities to avoid hangs and resource leaks during process kill.

File-Level Changes

Change Details Files
Extend amdgpu_vm_ready() to verify VM entities aren’t stopped
  • Remove redundant empty variable
  • Include eviction list status in ret
  • Lock and check immediate.stopped flag
  • Lock and check delayed.stopped flag
  • Simplify return to ret
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
Guard job submission in amdgpu_cs_vm_handling() against stopped VMs
  • Invoke amdgpu_vm_ready() before clearing freed VMs
  • Return -EINVAL when VM is not ready
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@opsiff opsiff merged commit 86680d1 into deepin-community:linux-6.6.y Sep 5, 2025
7 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants