Skip to content

Conversation

@chengjunlu
Copy link
Contributor

@chengjunlu chengjunlu commented Sep 25, 2025

This PR adds a new "zebin" compilation stage for XPU backend to align with CUDA compilation stages in triton.compile. The change introduces zebin as a binary format alternative to SPIRV for Intel XPU targets.

@chengjunlu chengjunlu changed the title Add a new stage to generate zebin to align CUDA stages in triton.compile [Draft] Add a new stage to generate zebin to align CUDA stages in triton.compile Sep 25, 2025
@chengjunlu chengjunlu force-pushed the chengjun/add_zebin_stage branch from 4cba65d to a943a26 Compare September 25, 2025 06:00
@etiotto etiotto requested a review from Copilot September 25, 2025 14:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new "zebin" compilation stage for XPU backend to align with CUDA compilation stages in triton.compile. The change introduces zebin as a binary format alternative to SPIRV for Intel XPU targets.

  • Adds make_zebin method to generate zebin binary format from SPIRV input
  • Updates binary extension from "spv" to "zebin" for XPU backend
  • Modifies compilation pipeline to handle zebin as a binary format alongside cubin and hsaco

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
third_party/intel/backend/compiler.py Adds zebin compilation stage and updates binary extension
python/triton/compiler/compiler.py Updates file parsing and compilation pipeline to support zebin format

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@etiotto etiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using ocloc to generate the native binary, can we use L0 to generate it ?

How about trying: https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#module-caching-with-native-binaries

@etiotto etiotto marked this pull request as draft October 9, 2025 14:10
@chengjunlu chengjunlu force-pushed the chengjun/add_zebin_stage branch from a943a26 to c7cbf86 Compare October 29, 2025 03:32
@chengjunlu chengjunlu marked this pull request as ready for review October 29, 2025 03:32
@chengjunlu chengjunlu linked an issue Oct 29, 2025 that may be closed by this pull request
@chengjunlu chengjunlu force-pushed the chengjun/add_zebin_stage branch from c7cbf86 to f2186c2 Compare October 29, 2025 03:50
@chengjunlu chengjunlu changed the title [Draft] Add a new stage to generate zebin to align CUDA stages in triton.compile Add a new stage to generate zebin to align CUDA stages in triton.compile Oct 29, 2025
@chengjunlu chengjunlu force-pushed the chengjun/add_zebin_stage branch from f2186c2 to dabaee1 Compare October 29, 2025 04:28
@chengjunlu
Copy link
Contributor Author

Instead of using ocloc to generate the native binary, can we use L0 to generate it ?

How about trying: https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#module-caching-with-native-binaries

It L0 API requires passing the device context which is not avaliable during triton.compile context.

@chengjunlu chengjunlu force-pushed the chengjun/add_zebin_stage branch 2 times, most recently from bbd3669 to b66f0b5 Compare October 29, 2025 07:13
size_t global_range_y = {gridY};
size_t global_range_z = {gridZ};
size_t local_range_x = {num_warps} * {threads_per_warp};
if (driver_version.find("+") != std::string::npos) {{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't make sense. Remove it.

# stores the text of each level of IR that was generated during compilation
asm_files = [Path(p) for c, p in metadata_group.items() if not c.endswith(".json")]

def read_file(path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the spv and zebin are in binary format. To dump the intermidate file either by text or binary format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth rewriting without exceptions? They usually work noticeably slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a new implementation following the function parse. Avoid to use the exception.

chengjunlu and others added 3 commits October 31, 2025 08:55
… or option = {"generate_native_code": 1}.

Signed-off-by: Lu,Chengjun <[email protected]>
Signed-off-by: Lu,Chengjun <[email protected]>
Copy link
Contributor

@anmyachev anmyachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengjunlu chengjunlu merged commit 9e23713 into main Nov 3, 2025
23 checks passed
@chengjunlu chengjunlu deleted the chengjun/add_zebin_stage branch November 3, 2025 03:26
Copy link
Contributor

@whitneywhtsang whitneywhtsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, there is no change by default. When generate_native_code is true, then instead of replacing spv stage with zebin stage, it adds an additional stage to generate zebin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Binary kernel for Inductor static kernel launcher.

5 participants