Skip to content

Conversation

@wxsIcey
Copy link
Collaborator

@wxsIcey wxsIcey commented Nov 14, 2025

What this PR does / why we need it?

Fixes a compatible bug with torch_npu.npu_fused_infer_attention_score which is discribed in #4020.
@momo609 tells us this solution.
cherry-pick: #4025

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with new added/existing test.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wxsIcey wxsIcey added ready read for review ready-for-test start test by label for PR labels Nov 14, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request updates a block size parameter from 64 to 128 in two distinct files. While the change itself appears to be a coordinated update, the direct hardcoding of this magic number in multiple locations introduces a significant maintainability risk. It is highly recommended to centralize such common configuration values into a single named constant, ideally in a shared utility module, to ensure consistency and simplify future updates.

@staticmethod
def get_supported_block_size() -> list[int]:
return [64]
return [128]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The block size 128 is a magic number that appears in multiple files. Consider defining this value as a named constant in a shared utility module (e.g., vllm_ascend/utils.py) to improve maintainability and ensure consistency across the codebase. Duplicating such values can lead to errors if the value needs to be updated in the future.

).page_size_bytes

block_alignment_bytes = 64
block_alignment_bytes = 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block_alignment_bytes value of 128 is duplicated with the get_supported_block_size in vllm_ascend/attention/attention_v1.py. To enhance maintainability and prevent inconsistencies, it is recommended to define this as a single, shared constant in a central utility file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants