Skip to content

Conversation

@samremes
Copy link
Contributor

@samremes samremes commented Jul 2, 2025

Proposed changes

Bring in the non-K major LDS layouts from old CK in ck-tile GEMM.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@aosewski aosewski marked this pull request as draft October 9, 2025 10:07
@aosewski aosewski marked this pull request as ready for review October 13, 2025 12:25
@aosewski aosewski merged commit d2bbca3 into develop Oct 13, 2025
42 of 46 checks passed
@aosewski aosewski deleted the samremes/optimized_lds_non_kmajor branch October 13, 2025 12:27
illsilin added a commit that referenced this pull request Oct 14, 2025
illsilin added a commit that referenced this pull request Oct 14, 2025
@ThomasNing ThomasNing restored the samremes/optimized_lds_non_kmajor branch October 15, 2025 15:13
AviralGoelAMD pushed a commit that referenced this pull request Oct 16, 2025
* Enable the adapted LDS B layout for Row-Major

* fix formatting

* Implement specialized col-major A LDS block descriptor

* Fix formatting

* Use VecLoadSize for AK1/BK1

* Fix some thread access pattern values

* Use GetVectorSizeA for A

* Fix formatting

* Add extra condition to avoid division by zero

* disable layout for wave32

* remove extra else

* fix formatting

* Fix formatting

* Rename one remaining TileDistributionEncodingPattern2D

* Use integer ceil division

* revert remod.py changes

* also revert utility.hpp

* use getA/BTileAccessPattern everywhere

* use integer_divide_ceil for AK0 too

---------

Co-authored-by: Adam Osewski <[email protected]>
Co-authored-by: Adam Osewski <[email protected]>
AviralGoelAMD pushed a commit that referenced this pull request Oct 16, 2025
samremes added a commit that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants