[CK_TILE] Improve F8F6F4 Scaled WarpGemm #3197

DDEle · 2025-11-12T09:49:54Z

Proposed changes

Add fp8/bf8 scaled warpgemm
Enable double access mode for above
slightly reformat and refactor

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Copilot

Pull Request Overview

This PR refactors and improves the F8F6F4 scaled warp GEMM implementation by consolidating multiple type-specific implementations into a single generic template structure and enabling multiple access modes (single, double, quad) for fp8/bf8 operations.

Key changes:

Consolidates multiple specialized WarpGemmAttributeMfmaImpl_f32_16x16x128_* structs into a single generic WarpGemmAttributeMfmaImpl_f32_16x16x128_f8f6f4 template
Adds new inner namespace wrap_gemm_dispatcher within impl namespace with enum constant aliases for cleaner code
Enables templated dispatcher pattern for scaled MFMA operations (fp8/bf8 combinations with 16x16x128 configuration) to support multiple access modes

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
warp_gemm_dispatcher.hpp	Refactored dispatcher by introducing inner namespace, renaming template struct, and adding templated specializations for scaled f8f6f4 operations
warp_gemm_attribute_mfma_impl.hpp	Consolidated multiple type-specific implementations into single generic template with improved type handling via lambdas
warp_gemm.hpp	Updated type aliases to reference the new unified implementation structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/ck_tile/ops/gemm/warp/warp_gemm_dispatcher.hpp

include/ck_tile/ops/gemm/warp/warp_gemm_attribute_mfma_impl.hpp

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/ck_tile/ops/gemm/warp/warp_gemm_dispatcher.hpp

* [CK_TILE] Improve F8F6F4 Scaled WarpGemm * Thanks, Copilot

[CK_TILE] Improve F8F6F4 Scaled WarpGemm

9c24afb

DDEle requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners November 12, 2025 09:49

DDEle requested a review from Copilot November 12, 2025 10:09

Copilot started reviewing on behalf of DDEle November 12, 2025 10:10 View session

Copilot finished reviewing on behalf of DDEle November 12, 2025 10:11

Copilot AI reviewed Nov 12, 2025

View reviewed changes

include/ck_tile/ops/gemm/warp/warp_gemm_dispatcher.hpp Outdated Show resolved Hide resolved

include/ck_tile/ops/gemm/warp/warp_gemm_attribute_mfma_impl.hpp Outdated Show resolved Hide resolved

include/ck_tile/ops/gemm/warp/warp_gemm_attribute_mfma_impl.hpp Outdated Show resolved Hide resolved

Thanks, Copilot

aafe6ab

DDEle requested a review from Copilot November 13, 2025 01:40

Copilot started reviewing on behalf of DDEle November 13, 2025 01:41 View session

Merge branch 'develop' into f8f6f4-warp-gemm

ffe07dc

Copilot finished reviewing on behalf of DDEle November 13, 2025 01:42

Copilot AI reviewed Nov 13, 2025

View reviewed changes

include/ck_tile/ops/gemm/warp/warp_gemm_dispatcher.hpp Show resolved Hide resolved

DDEle mentioned this pull request Nov 13, 2025

[CK_TILE] Add Flatmm MX FP8 #3208

Draft

10 tasks

asleepzzz approved these changes Nov 13, 2025

View reviewed changes

asleepzzz merged commit 8d50001 into develop Nov 13, 2025
21 of 22 checks passed

asleepzzz deleted the f8f6f4-warp-gemm branch November 13, 2025 12:22

pmaybank pushed a commit that referenced this pull request Nov 13, 2025

[CK_TILE] Improve F8F6F4 Scaled WarpGemm (#3197)

c0c2458

* [CK_TILE] Improve F8F6F4 Scaled WarpGemm * Thanks, Copilot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE] Improve F8F6F4 Scaled WarpGemm #3197

[CK_TILE] Improve F8F6F4 Scaled WarpGemm #3197

Uh oh!

DDEle commented Nov 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[CK_TILE] Improve F8F6F4 Scaled WarpGemm #3197

[CK_TILE] Improve F8F6F4 Scaled WarpGemm #3197

Uh oh!

Conversation

DDEle commented Nov 12, 2025

Proposed changes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants