Skip to content

Conversation

@DDEle
Copy link
Contributor

@DDEle DDEle commented Nov 12, 2025

Proposed changes

  • Add fp8/bf8 scaled warpgemm
  • Enable double access mode for above
  • slightly reformat and refactor

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and improves the F8F6F4 scaled warp GEMM implementation by consolidating multiple type-specific implementations into a single generic template structure and enabling multiple access modes (single, double, quad) for fp8/bf8 operations.

Key changes:

  • Consolidates multiple specialized WarpGemmAttributeMfmaImpl_f32_16x16x128_* structs into a single generic WarpGemmAttributeMfmaImpl_f32_16x16x128_f8f6f4 template
  • Adds new inner namespace wrap_gemm_dispatcher within impl namespace with enum constant aliases for cleaner code
  • Enables templated dispatcher pattern for scaled MFMA operations (fp8/bf8 combinations with 16x16x128 configuration) to support multiple access modes

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
warp_gemm_dispatcher.hpp Refactored dispatcher by introducing inner namespace, renaming template struct, and adding templated specializations for scaled f8f6f4 operations
warp_gemm_attribute_mfma_impl.hpp Consolidated multiple type-specific implementations into single generic template with improved type handling via lambdas
warp_gemm.hpp Updated type aliases to reference the new unified implementation structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot finished reviewing on behalf of DDEle November 13, 2025 01:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DDEle DDEle mentioned this pull request Nov 13, 2025
10 tasks
@asleepzzz asleepzzz merged commit 8d50001 into develop Nov 13, 2025
21 of 22 checks passed
@asleepzzz asleepzzz deleted the f8f6f4-warp-gemm branch November 13, 2025 12:22
pmaybank pushed a commit that referenced this pull request Nov 13, 2025
* [CK_TILE] Improve F8F6F4 Scaled WarpGemm

* Thanks, Copilot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants