Skip to content

Conversation

@j-stephan
Copy link
Contributor

Motivation

This PR adds Composable Kernel's ck_tile examples to this repository.

Technical Details

This PR only targets ROCm + Linux; Windows and CUDA are not supported by Composable Kernel.

Test Plan

Test Result

Submission Checklist

@j-stephan j-stephan self-assigned this Oct 20, 2025
@j-stephan j-stephan force-pushed the ComposableKernel branch 2 times, most recently from bc1a295 to 05d8ec8 Compare October 28, 2025 13:06
@j-stephan j-stephan marked this pull request as ready for review October 28, 2025 13:14
@j-stephan j-stephan requested review from a team as code owners October 28, 2025 13:14
@j-stephan
Copy link
Contributor Author

The failing markdown linter will be resolved once ROCm/rocm-docs-core#1449 is merged.

@j-stephan
Copy link
Contributor Author

This PR requires ROCm 7.1. Once #341 is merged the build errors should disappear.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Composable Kernel's ck_tile examples to the ROCm Examples repository, focusing exclusively on ROCm and Linux platforms (CUDA and Windows are not supported). The examples demonstrate various GPU operations using CK Tile's programming model, including GEMM operations, convolutions, and basic tensor operations.

Key Changes

  • Added comprehensive examples for GEMM operations (batched, block-scale, flatmm, multi-d, grouped)
  • Introduced grouped convolution examples (forward and backward weight)
  • Implemented basic operations (elementwise, reduce, permute, img2col)
  • Provided build infrastructure through CMake and Makefiles with architecture-specific support checks

Reviewed Changes

Copilot reviewed 111 out of 281 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Libraries/ComposableKernel/gemm/flatmm/flatmm_basic.cpp Implements FLATMM GEMM kernel with tile partitioning and pipeline configuration
Libraries/ComposableKernel/gemm/block_scale_gemm/gemm_aquant_basic.cpp Implements block-scale quantized GEMM with group quantization support
Libraries/ComposableKernel/gemm/batched_gemm/batched_gemm.cpp Implements batched GEMM operations with configurable pipeline strategies
Libraries/ComposableKernel/convolution/grouped_convolution/grouped_convolution_forward.cpp Implements grouped convolution forward pass
Libraries/ComposableKernel/basic/reduce/reduce.cpp Demonstrates 2D reduction operations with block tiling
Libraries/ComposableKernel/basic/permute/permute.cpp Generic tensor permutation with matrix-core optimized alternative
CMakeLists.txt and Makefile files Build configuration with architecture checks for gfx908/gfx90a/gfx942/gfx950

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@zichguan-amd zichguan-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@vidyasagar-amd vidyasagar-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition

Copy link
Contributor

@adeljo-amd adeljo-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants