[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity #2242

jcaip · 2025-05-22T12:49:23Z

Summary:

We have this gemm already in torchao, but for weight sparsity, which assumes the weights are in row-major formats and are sparse

For activation sparsity, we need the weights to be stored in column-major format to allow for us to use the selective weight loading kernel for decode.

Test Plan:

pytest test/sparsity/test_activation24.py

Reviewers:

Subscribers:

Tasks:

Tags:

…sity Summary: We have this gemm already in torchao, but for weight sparsity. For activation sparsity, we need the weights to be stored in column-major format to allow for us to use the selective weight loading kernel for decode. Test Plan: Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-05-22T12:49:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2242

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 4 Pending

As of commit e17ebfd with merge base f0f976c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre

lgtm, left a couple minor comments

danielvegamyhre · 2025-05-22T15:06:43Z

torchao/csrc/cuda/activation24/sparse_gemm.cu

+  using ElementOut = cutlass::bfloat16_t;
+  using ElementAccumulator = float;
+
+  using TileShape = cute::Shape<cute::_128, cute::_256, cute::_128>;


how was this tile shape selected?

This is the default I copied over, planning on adding some tuning in a subsequent PR for more perf.

danielvegamyhre · 2025-05-22T15:08:47Z

torchao/csrc/cuda/activation24/sparse_gemm.cu

+          cutlass::arch::OpClassSparseTensorOp,
+          ElementA,
+          cutlass::layout::RowMajor,
+          32,


nit: would help with readability to define give these constant args variable names IMO

yeah good point, will address these nits when I add in the tile config tuning just want to get unblocked for now.

danielvegamyhre · 2025-05-22T15:20:59Z

torchao/csrc/cuda/activation24/sparse_gemm.cu

+    device_guard.emplace(tensor_a.device());
+  }
+
+  using K = SparseRowwiseKernel<cutlass::float_e4m3_t>;


nit: more descriptive variable name would be helpful

jcaip added 2 commits May 21, 2025 16:46

remove cutlass compression

6d85b5c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 22, 2025

jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label May 22, 2025

jcaip added 6 commits May 22, 2025 05:50

ruff fix

d790908

one more ruff fix

af3e70d

Merge branch 'main' into jcaip/sm90_sparse_fp8_gemm

343742a

don't build for CUDA 11.8

8b1e8ff

fix formatting

618cd85

ifdef to avoid issues

e17ebfd

danielvegamyhre approved these changes May 22, 2025

View reviewed changes

jcaip merged commit 4c6188f into main May 22, 2025
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity #2242

[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity #2242

Uh oh!

jcaip commented May 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 22, 2025 •

edited

Loading

Uh oh!

danielvegamyhre left a comment

Uh oh!

danielvegamyhre May 22, 2025

Uh oh!

jcaip May 22, 2025

Uh oh!

danielvegamyhre May 22, 2025

Uh oh!

jcaip May 22, 2025

Uh oh!

danielvegamyhre May 22, 2025

Uh oh!

Uh oh!

Uh oh!

[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity #2242

[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity #2242

Uh oh!

Conversation

jcaip commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2242

⏳ No Failures, 4 Pending

Uh oh!

danielvegamyhre left a comment

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre May 22, 2025

Choose a reason for hiding this comment

Uh oh!

jcaip May 22, 2025

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre May 22, 2025

Choose a reason for hiding this comment

Uh oh!

jcaip May 22, 2025

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jcaip commented May 22, 2025 •

edited

Loading

pytorch-bot bot commented May 22, 2025 •

edited

Loading