[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

ProExpertProg · 2024-12-04T20:14:46Z

This PR fixes the fp8 case, when cutlass_mm is not available. It contains the following fixes:

Removes the padding for fp8 torch._scaled_mm in the torch.compile case, as branch specialization might not work correctly, and it makes fusion difficult.
Implements redundant slice and slice_scatter elimination, which is implemented in PyTorch but does not cover all cases. It renames the RedundantReshapesPass to NoopEliminationPass.
Minor custom pass improvements.

This PR is a pre-requisite PR to #10836, which enables torch.compile on AMD and uses the non-cutlass-fp8 path.

github-actions · 2024-12-04T20:15:01Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2025-02-15T12:02:29Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bnellnm · 2025-02-26T19:25:56Z

vllm/compilation/reshapes.py

Are these always the right ops to use? e.g. is there a torch.ops.aten.slice.default or a torch.ops.aten.slice_scatter.Tensor?

I haven't seen them so I am not sure - I just went off what I saw. The other overloads could be added easily if we ever see them in the graph

Signed-off-by: luka <[email protected]>

mgoin · 2025-02-28T01:44:48Z

vllm/model_executor/layers/quantization/utils/w8a8_utils.py

        # This could change in the future.
+        # We also don't pad when using torch.compile,
+        # as it breaks with dynamic shapes.
+        config = get_current_vllm_config().compilation_config


Is this cached? It could be expensive each forward call

Yes, in eager mode this will get called on every forward pass, but it will only happen once when compiled. In eager mode there isn't really a better way that's still correct - the only way is to check the config context. I don't think this getter is significant but I haven't measured it.

We could pass in a allow_input_padding flag? and pass it in? I do think this is annoying though. I think it's woth it to do a quick check for performance regressions on a small model eager mode benchmark with cutlass_scaled_mm disabled?

I think we'd have to pass that flag through the whole call stack though so I don't think it's worth it. I'll run a small model.

tests/compile/test_fusion.py

tlrmchlsmth · 2025-02-28T17:29:54Z

vllm/model_executor/layers/quantization/utils/w8a8_utils.py

        # This could change in the future.
+        # We also don't pad when using torch.compile,
+        # as it breaks with dynamic shapes.
+        config = get_current_vllm_config().compilation_config


We could pass in a allow_input_padding flag? and pass it in? I do think this is annoying though. I think it's woth it to do a quick check for performance regressions on a small model eager mode benchmark with cutlass_scaled_mm disabled?

vllm/compilation/reshapes.py

tlrmchlsmth

Looks good overall but I had a few minor comments

- rename cutlass_fp8 test flag - rename noop pass - improve some comments Signed-off-by: luka <[email protected]>

tlrmchlsmth

Thanks for the great work! LGTM assuming we don't see any performance regression

ProExpertProg · 2025-02-28T19:33:20Z

Yep will post perf numbers once I have them, thanks!

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]>

…-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902)" This reverts commit bd56c98.

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

BoyuanFeng · 2025-04-13T00:08:07Z

nit: noop elimination for slice errors when end = -1.

repro:

import torch

def dims_equivalent(dim, i_dim) -> bool:
    # Case 1 and 2
    if dim == i_dim or dim == -1:
        return True
    # Case 3
    return isinstance(dim, torch.fx.Node) and dim.meta["val"] == i_dim

input = torch.randn((2,3,2))
dim_index, start, end = 0, 0, -1

input_shape = input.shape

i_dim = input_shape[dim_index]

# following https://github.com/vllm-project/vllm/blob/main/vllm/compilation/noop_elimination.py#L79
if start == 0 and dims_equivalent(end, i_dim):
    is_noop = True
else:
    is_noop = False

if is_noop:
    # input.shape: (2,3,2)
    # torch.ops.aten.slice.Tensor(input, dim_index, start, end).shape: (1,3,2)
    assert input.shape == torch.ops.aten.slice.Tensor(input, dim_index, start, end).shape

similarly for slice_scatter.

ProExpertProg · 2025-04-23T20:22:09Z

nit: noop elimination for slice errors when end = -1.

repro:

import torch

def dims_equivalent(dim, i_dim) -> bool:
    # Case 1 and 2
    if dim == i_dim or dim == -1:
        return True
    # Case 3
    return isinstance(dim, torch.fx.Node) and dim.meta["val"] == i_dim

input = torch.randn((2,3,2))
dim_index, start, end = 0, 0, -1

input_shape = input.shape

i_dim = input_shape[dim_index]

# following https://github.com/vllm-project/vllm/blob/main/vllm/compilation/noop_elimination.py#L79
if start == 0 and dims_equivalent(end, i_dim):
    is_noop = True
else:
    is_noop = False

if is_noop:
    # input.shape: (2,3,2)
    # torch.ops.aten.slice.Tensor(input, dim_index, start, end).shape: (1,3,2)
    assert input.shape == torch.ops.aten.slice.Tensor(input, dim_index, start, end).shape

similarly for slice_scatter.

Great find - I didn't realize slice handles end differently. Could you file an issue for this please?

BoyuanFeng · 2025-04-23T20:41:23Z

issue: #17078

Btw, pytorch supports noop elimination for view, slice, and slice_scatter now, which should be equivalent with noop_elimination.py.

ProExpertProg · 2025-04-23T21:45:48Z

Okay, sounds good, we can probably deprecate the pass, although it's nice to have an easy place to add noop transforms to unblock pattern matching in the short term before they fixes are upstreamed to torch.

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]>

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch from b8ab496 to e5ded5c Compare December 4, 2024 20:18

mergify bot added the needs-rebase label Feb 15, 2025

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch from e5ded5c to a3cb530 Compare February 25, 2025 20:52

ProExpertProg requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners February 25, 2025 20:52

mergify bot removed the needs-rebase label Feb 25, 2025

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch 2 times, most recently from 65afeae to c22186b Compare February 26, 2025 19:08

ProExpertProg changed the title ~~Fix for the padding in the non-cutlass-fp8 case~~ [torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case Feb 26, 2025

bnellnm reviewed Feb 26, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 26, 2025

bnellnm approved these changes Feb 26, 2025

View reviewed changes

ProExpertProg added 4 commits February 27, 2025 22:38

Fix for the padding in the non-cutlass-fp8 case (formatted)

5dc6d69

Signed-off-by: luka <[email protected]>

Added redundant slice/slice_scatter elimination

fe74515

Signed-off-by: luka <[email protected]>

Allocate device identity for FP8 _scaled_mm

cd15aca

Signed-off-by: luka <[email protected]>

Only test cutlass path if supported

427bb9d

Signed-off-by: luka <[email protected]>

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch from 12e173e to 427bb9d Compare February 27, 2025 22:38

mgoin approved these changes Feb 28, 2025

View reviewed changes

tlrmchlsmth reviewed Feb 28, 2025

View reviewed changes

PR comments:

acb8557

- rename cutlass_fp8 test flag - rename noop pass - improve some comments Signed-off-by: luka <[email protected]>

tlrmchlsmth approved these changes Feb 28, 2025

View reviewed changes

ProExpertProg changed the title ~~[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case~~ [torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass Feb 28, 2025

mgoin approved these changes Feb 28, 2025

View reviewed changes

mgoin merged commit bd56c98 into vllm-project:main Feb 28, 2025
40 checks passed

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 cas…

27aacf9

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]>

tlrmchlsmth added a commit that referenced this pull request Mar 5, 2025

Revert "[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass…

eecb574

…-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902)" This reverts commit bd56c98.

tlrmchlsmth mentioned this pull request Mar 5, 2025

Revert "[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass… #14317

Closed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 cas…

85fd3a7

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]>

Uh oh!

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

Uh oh!

Conversation

ProExpertProg commented Dec 4, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2024

Uh oh!

mergify bot commented Feb 15, 2025

Uh oh!

bnellnm Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlrmchlsmth Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg commented Feb 28, 2025

Uh oh!

Uh oh!

BoyuanFeng commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProExpertProg commented Apr 23, 2025

Uh oh!

BoyuanFeng commented Apr 23, 2025

Uh oh!

ProExpertProg commented Apr 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ProExpertProg commented Dec 4, 2024 •

edited by github-actions bot

Loading

tlrmchlsmth Feb 28, 2025 •

edited

Loading

tlrmchlsmth Feb 28, 2025 •

edited

Loading

BoyuanFeng commented Apr 13, 2025 •

edited

Loading