-
Notifications
You must be signed in to change notification settings - Fork 249
Tile engine for streamk #3157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Tile engine for streamk #3157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds GEMM StreamK support to the tile engine, introducing a new highly configurable GEMM kernel framework with stream-k partitioning support for AMD GPUs (gfx90a, gfx942, gfx950).
Key changes:
- Adds Python-based kernel instance builder with parallel generation support and comprehensive validation
- Implements C++ profiler and benchmark infrastructure for single-kernel and multi-kernel testing
- Removes
reduction_strategyfield fromStreamKHostArgsto make it a compile-time template parameter
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tile_engine/ops/gemm_streamk/gemm_streamk_validation_utils.py | Validation utilities for tile configurations, GPU detection, and constraint checking |
| tile_engine/ops/gemm_streamk/gemm_streamk_instance_builder.py | Python script to generate individual GEMM kernel instances with parallel processing |
| tile_engine/ops/gemm_streamk/gemm_streamk_profiler.hpp | C++ profiler class for benchmarking kernel performance |
| tile_engine/ops/gemm_streamk/gemm_streamk_benchmark_single.cpp | Single-kernel benchmark executable |
| tile_engine/ops/gemm_streamk/gemm_streamk_benchmark.hpp | Benchmark infrastructure with problem/result structures |
| tile_engine/ops/gemm_streamk/gemm_streamk_common.hpp | Common types, traits, and utility functions |
| tile_engine/ops/gemm_streamk/CMakeLists.txt | Build configuration with individual target generation |
| tile_engine/ops/gemm_streamk/configs/default_config.json | Default configuration for kernel generation |
| include/ck_tile/ops/gemm/kernel/streamk_gemm_kernel.hpp | Refactored to use compile-time reduction strategy |
| example/ck_tile/40_streamk_gemm/*.cpp | Updated examples to use compile-time reduction strategy |
| Jenkinsfile | Added CI pipeline integration |
Comments suppressed due to low confidence (1)
tile_engine/ops/gemm_streamk/gemm_streamk_instance_builder.py:758
- Normal methods should have 'self', rather than 'work_item', as their first parameter.
def _generate_single_kernel_individual(work_item):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Filter out unsupported trait combinations | ||
| combinations = [] | ||
| for combo in all_combinations: | ||
| pipeline, epilogue, scheduler, reduction_strategy = combo[:4] |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unpacking order is incorrect. combo is created from itertools.product(pipelines, epilogues, schedulers, pad_m_values, pad_n_values, pad_k_values, persistent_values, reduction_strategy_value) (lines 256-266), so the first 4 elements are (pipeline, epilogue, scheduler, pad_m), not (pipeline, epilogue, scheduler, reduction_strategy). This causes validation to check wrong parameters. Should be: pipeline, epilogue, scheduler, pad_m, pad_n, pad_k, persistent, reduction_strategy = combo
| pipeline, epilogue, scheduler, reduction_strategy = combo[:4] | |
| pipeline, epilogue, scheduler, pad_m, pad_n, pad_k, persistent, reduction_strategy = combo |
| ) | ||
| else: | ||
| # Fallback to minimal default | ||
| combinations = [("compv3", "cshuffle", "intrawave", "reduction_strategy", False, False, False, False)] |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback default combination has incorrect tuple structure. Based on the combo structure from lines 256-266, the 4th element should be pad_m (boolean), not the string "reduction_strategy". Should be: combinations = [(\"compv3\", \"cshuffle\", \"intrawave\", False, False, False, False, \"reduction\")] to match the order (pipeline, epilogue, scheduler, pad_m, pad_n, pad_k, persistent, reduction_strategy).
| combinations = [("compv3", "cshuffle", "intrawave", "reduction_strategy", False, False, False, False)] | |
| combinations = [("compv3", "cshuffle", "intrawave", False, False, False, False, "reduction")] |
| def _generate_single_kernel_individual(work_item): | ||
| """Worker function to generate a single individual kernel file""" | ||
| tile_config, trait_combo, working_path, datatype, layout = work_item | ||
|
|
||
| # Create a temporary builder instance for this worker | ||
| builder = GemmKernelBuilder(working_path, datatype, layout) | ||
|
|
||
| try: | ||
| kernel_name, instance_code = builder._generate_kernel_instance( | ||
| tile_config, trait_combo | ||
| ) | ||
|
|
||
| # Create simplified filename without the "gemm_" prefix | ||
| # Remove "gemm_" from the beginning of kernel_name for the filename | ||
| simplified_name = kernel_name | ||
| if simplified_name.startswith("gemm_"): | ||
| simplified_name = simplified_name[5:] # Remove "gemm_" prefix | ||
|
|
||
| # Write individual header file | ||
| header_file = working_path / f"gemm_streamk_single_{simplified_name}.hpp" | ||
| with open(header_file, "w") as f: | ||
| f.write(instance_code) | ||
|
|
||
| return (kernel_name, trait_combo, tile_config) | ||
| except Exception as e: | ||
| print(f"Error generating individual kernel: {e}") | ||
| return None |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is defined inside the class but without self parameter and with incorrect indentation. It should be defined at module level (unindented) or as a static method. This will cause the function to not be accessible when called from executor.submit(_generate_single_kernel_individual, item) at line 634.
| def _generate_single_kernel_individual(work_item): | |
| """Worker function to generate a single individual kernel file""" | |
| tile_config, trait_combo, working_path, datatype, layout = work_item | |
| # Create a temporary builder instance for this worker | |
| builder = GemmKernelBuilder(working_path, datatype, layout) | |
| try: | |
| kernel_name, instance_code = builder._generate_kernel_instance( | |
| tile_config, trait_combo | |
| ) | |
| # Create simplified filename without the "gemm_" prefix | |
| # Remove "gemm_" from the beginning of kernel_name for the filename | |
| simplified_name = kernel_name | |
| if simplified_name.startswith("gemm_"): | |
| simplified_name = simplified_name[5:] # Remove "gemm_" prefix | |
| # Write individual header file | |
| header_file = working_path / f"gemm_streamk_single_{simplified_name}.hpp" | |
| with open(header_file, "w") as f: | |
| f.write(instance_code) | |
| return (kernel_name, trait_combo, tile_config) | |
| except Exception as e: | |
| print(f"Error generating individual kernel: {e}") | |
| return None | |
| def _generate_single_kernel_individual(work_item): | |
| """Worker function to generate a single individual kernel file""" | |
| tile_config, trait_combo, working_path, datatype, layout = work_item | |
| # Create a temporary builder instance for this worker | |
| builder = GemmKernelBuilder(working_path, datatype, layout) | |
| try: | |
| kernel_name, instance_code = builder._generate_kernel_instance( | |
| tile_config, trait_combo | |
| ) | |
| # Create simplified filename without the "gemm_" prefix | |
| # Remove "gemm_" from the beginning of kernel_name for the filename | |
| simplified_name = kernel_name | |
| if simplified_name.startswith("gemm_"): | |
| simplified_name = simplified_name[5:] # Remove "gemm_" prefix | |
| # Write individual header file | |
| header_file = working_path / f"gemm_streamk_single_{simplified_name}.hpp" | |
| with open(header_file, "w") as f: | |
| f.write(instance_code) | |
| return (kernel_name, trait_combo, tile_config) | |
| except Exception as e: | |
| print(f"Error generating individual kernel: {e}") | |
| return None |
| trait_parts[3] == "false", # pad_m | ||
| trait_parts[4] == "false", # pad_n | ||
| trait_parts[5] == "false", # pad_k | ||
| trait_parts[6] , # persistent |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing boolean conversion for persistent field. Lines 868-870 convert pad_m/pad_n/pad_k using == \"false\" (which produces True when the string is "false"), but line 871 passes the raw string. This inconsistency will cause type mismatches. Should be: trait_parts[6] == \"true\", # persistent to convert to boolean.
| trait_parts[6] , # persistent | |
| trait_parts[6] == "true", # persistent |
Jenkinsfile
Outdated
| --warmup 5 --repeat 5 --verbose --json results.json && \ | ||
| ninja -j64 benchmark_gemm_multi_d_all && \ | ||
| python3 ../tile_engine/ops/gemm_multi_d/gemm_multi_d_benchmark.py . --problem-sizes "1024,1024,1024" \ | ||
| --warmup 5 --repeat 5 --verbose --json results.json """ |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line continuation backslash and logical separator. Line 1647 ends with closing triple quotes, but line 1648 continues with more commands. This breaks the shell command syntax. Should add && \\ at the end of line 1647 before the closing quotes, or start a new command string on line 1648.
| --warmup 5 --repeat 5 --verbose --json results.json """ | |
| --warmup 5 --repeat 5 --verbose --json results.json && \" |
Jenkinsfile
Outdated
| --warmup 5 --repeat 5 --verbose --json results.json && \ | ||
| ninja -j64 benchmark_gemm_multi_d_all && \ | ||
| python3 ../tile_engine/ops/gemm_multi_d/gemm_multi_d_benchmark.py . --problem-sizes "1024,1024,1024" \ | ||
| --warmup 5 --repeat 5 --verbose --json results.json """ |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line continuation backslash and logical separator (duplicate of earlier issue in different location). Line 1692 ends with closing triple quotes, but line 1693 continues with more commands. This breaks the shell command syntax. Should add && \\ at the end of line 1692 before the closing quotes, or start a new command string on line 1693.
| --warmup 5 --repeat 5 --verbose --json results.json """ | |
| --warmup 5 --repeat 5 --verbose --json results.json && \ |
| scheduler_type_map = { | ||
| "intrawave": "ck_tile::GemmPipelineScheduler::Intrawave", | ||
| "interwave": "ck_tile::GemmPipelineScheduler::Interwave", | ||
| "default": "ck_tile::GemmPipelineScheduler::Default", | ||
| } |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable scheduler_type_map is not used.
| scheduler_type_map = { | |
| "intrawave": "ck_tile::GemmPipelineScheduler::Intrawave", | |
| "interwave": "ck_tile::GemmPipelineScheduler::Interwave", | |
| "default": "ck_tile::GemmPipelineScheduler::Default", | |
| } |
| base_pipeline_map = { | ||
| "mem": "ck_tile::BaseGemmPipelineAgBgCrMem", | ||
| "compv3": "ck_tile::BaseGemmPipelineAgBgCrCompV3", | ||
| "compv4": "ck_tile::BaseGemmPipelineAgBgCrCompV4", | ||
| } |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable base_pipeline_map is not used.
| base_pipeline_map = { | |
| "mem": "ck_tile::BaseGemmPipelineAgBgCrMem", | |
| "compv3": "ck_tile::BaseGemmPipelineAgBgCrCompV3", | |
| "compv4": "ck_tile::BaseGemmPipelineAgBgCrCompV4", | |
| } |
| file << "rocm_version,device_name," | ||
| << "split_k,m,n,k,stride_a,stride_b,stride_c," | ||
| << "dtype_a,dtype_b,dtype_acc,dtype_c," << "layout_a,layout_b,layout_c," | ||
| << "structured_sparsity," << "name," | ||
| << "latency(ms),tflops(TFlops),bandwidth(GB/s),metric\n"; | ||
| } | ||
|
|
||
| const auto& problem = kernel_instance.problem_; | ||
| const auto& name = kernel_instance.name_; | ||
| const auto& perf = kernel_instance.perf_result_; | ||
|
|
||
| file << get_rocm_version() << "," << ck_tile::get_device_name() << "," | ||
| << problem.split_k_ << "," << problem.m_ << "," << problem.n_ << "," | ||
| << problem.k_ << "," << problem.stride_a_ << "," << problem.stride_b_ << "," | ||
| << problem.stride_c_ << "," << problem.dtype_a_ << "," << problem.dtype_b_ | ||
| << "," << problem.dtype_acc_ << "," << problem.dtype_c_ << "," | ||
| << problem.layout_a_ << "," << problem.layout_b_ << "," << problem.layout_c_ | ||
| << "," << problem.structured_sparsity_ << "," << name << "," << std::fixed | ||
| << std::setprecision(4) << perf.latency_ << "," << std::fixed | ||
| << std::setprecision(4) << perf.tflops_ << "," << std::fixed | ||
| << std::setprecision(4) << perf.bandwidth_ << "," << get_metric_name(metric) | ||
| << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add fields for reduction strategy type and for persistent?
…k GEMM. - This commit lays the groundwork for integrating the tile engine into streamk GEMM. It focuses on creating benchmark executables for streamk GEMM. - Additional scripts like test_benchmark.sh and gemm_benchmark.py will be added once the streamk implementation reaches stability.
…ated implementation
70ab446 to
7363096
Compare
…put of CK TILE STREAMK
Proposed changes
We copied the tile engine code of gemm and revise it to align with the StreamK.
Note: The entire tile engine will be refactored to extract common functionality.
reduction_strategyto default_config.json so that there are instances foratomicandreductionpersistent==trueto default_config.jsonbenchmark instances are
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered