Skip to content

Conversation

@LeiWang1999
Copy link
Contributor

This pull request includes several significant updates to the bitblas library, focusing on enhancing code modularity, improving kernel name generation, and refining CUDA initialization processes. The most important changes are grouped by theme below.

Enhancements to Kernel Name Generation:

  • Added MatmulKernelNameGenerator class to generate kernel names based on configuration and hints in bitblas/ops/general_matmul/__init__.py. This class includes methods for serializing hints and simplifying data types. [1] [2]
  • Introduced BaseKernelNameGenerator abstract class to provide a base for kernel name generators in bitblas/ops/operator.py.

Improvements in CUDA Initialization:

  • Refactored CUDA initialization functions to use predefined templates for setting attributes and defining functions in bitblas/builder/wrapper/tir.py. [1] [2] [3] [4]

Updates to Function and Module Handling:

  • Modified create_dispatch_mod and fast_tune_with_dynamic_range functions to handle additional parameters and improve dispatching in bitblas/base/utils.py. [1] [2] [3]
  • Changed references from optimized_func to optimized_mod across several files to reflect the updated structure of optimized modules. [1] [2] [3]

Code Modularization:

  • Added import statements for new modules and classes to improve code modularity in bitblas/ops/general_matmul/__init__.py and bitblas/ops/operator.py. [1] [2] [3]

These changes collectively enhance the maintainability and functionality of the bitblas library, particularly in the areas of kernel name generation and CUDA initialization.

For example, a tuned kernel can finally be wrapped with

extern "C" void init() {
    
    cudaFuncSetAttribute(matmul_n256k256_Af16Wf16_tcx16x64x128w16x16xp2_opt_m_256, cudaFuncAttributeMaxDynamicSharedMemorySize, 41472);

}

extern "C" void call(half* __restrict__ A, half* __restrict__ B, half* __restrict__ C, int m, cudaStream_t stream=cudaStreamDefault) {
        if (m == 0) return; 
        if (m <= 1) {
                matmul_n256k256_Af16Wf16_simt_opt_m_1<<<dim3(64, 1, 1), dim3(32, 4, 1), 0, stream>>>(A, B, C, m); 
        }
        else if (m <= 256) {
                matmul_n256k256_Af16Wf16_tcx16x64x128w16x16xp2_opt_m_256<<<dim3(4, (m + 15) / 16, 1), dim3(32, 1, 4), 41472, stream>>>(A, B, C, m); 
        }
        else {
                matmul_n256k256_Af16Wf16_tcx16x64x128w16x16xp2_opt_m_256<<<dim3(4, (m + 15) / 16, 1), dim3(32, 1, 4), 41472, stream>>>(A, B, C, m); 
        }

}

@LeiWang1999 LeiWang1999 merged commit f40d9ba into microsoft:main Aug 24, 2024
@LeiWang1999 LeiWang1999 mentioned this pull request Aug 24, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant