Skip to content

Conversation

@ecamartins
Copy link
Collaborator

@ecamartins ecamartins commented Nov 12, 2025

Proposed changes

The original CK Stream-K implementation used a Tile Partitioner that was based on old CK's Stream-K block to C tile map. However, old CK's implementation did not align with the original Stream-K paper. Thus, we implemented a new Tile Partitioner (#3018) and associated Stream-K Kernel (#3064). The new CK Tile Stream-K kernel implementation was placed in the reboot namespace.

Now that all functionality for the new implementation is in place, including examples (#3107) and tile engine (#3157), we can now remove the old CK tile Stream-K implementation. Thus, this PR makes the following changes:

  • Removes all uses of the old CK Tile Stream-K implementation.
  • Removes the reboot namespace such that the new implementation is in the ck_tile namespace only.
  • Adds tests for fp8 and bf8 for the new implementation as these were only in place for the old implementation.
  • Updated comment style in the Stream-K kernel file to follow /** style.
  • Removes the old CK Tile Stream-K Tile partitioner.
  • Remove the v2 suffix from the new CK Tile Tile Partitioner derived classes.

Note: This PR is dependent on #3157. I will mark this PR as ready for review once #3157 has merged

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

CongMa13 and others added 7 commits November 7, 2025 20:33
…k GEMM.

- This commit lays the groundwork for integrating the tile engine into streamk GEMM.
  It focuses on creating benchmark executables for streamk GEMM.
- Additional scripts like test_benchmark.sh and gemm_benchmark.py will be added once
  the streamk implementation reaches stability.
* Add gtests for compiler CI for faster testing

* Add changes to have a custom target

* Add a gtest suite for gemm kernel for running CI tests with compiler mode

* Fix Clang error (EOL)

* Removed compiler subfolder from CMake

* Add gtest suite for gemm kernel

* Disable failed tests

* Fix build errors

* Resolved PR comments

* Update shape for persistent gemm kernel test

* Seperated types by H/W archs

* Made changes to persistent types

* Fix persistent build failure issue

---------

Co-authored-by: Thomas Ning <[email protected]>
@ecamartins ecamartins self-assigned this Nov 12, 2025
The original CK Stream-K implementation used a Tile Partitioner that was
based on old CK's Stream-K block to C tile map. However, old CK's
implementation did not align with the original Stream-K paper. Thus, we
implemented a new Tile Partitioner and associated Stream-K Kernel. The
kernel implementation was placed in the reboot namespace. Now that all
functionality for the new implementation is in place, this change makes
the following changes:
- Removes all uses of the old CK Tile Stream-K implementation.
- Removes the reboot namespace such that the new implementation is in
  the ck_tile namespace only.
- Adds tests for fp8 and bf8 for the new implementation as these were
  only in place for the old implementation.
- Removes the old CK Tile Stream-K Tile partitioner
- Remove the v2 suffix from the new CK Tile Tile Partitioner derived
  classes.
@ecamartins ecamartins force-pushed the emimarti/ck_tile/remove_old_streamk branch from d0a9119 to 2ef3ae5 Compare November 12, 2025 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants