Implement `make_opt_flags` function for XPU and enable tests in `test_matmul.py` #5051

anmyachev · 2025-09-05T18:11:52Z

Note for reviewers: I've left only the most basic heuristics. If you have improvements in mind that will definitely work better without testing, i.e. were already known, then we can make edits directly to this pull request. If you have improvements that need to be tested - that's also good, please also write, but I'd prefer to implement the basic version as quickly as possible and tune it in separate PRs if possible.

Pass rate: 84.11% -> 89.04%

anmyachev · 2025-09-08T12:18:07Z

Looks like I can test these changes using python/triton_kernels/tests/test_matmul.py

anmyachev · 2025-09-08T17:34:15Z

python/triton_kernels/tests/test_matmul.py

        if split_k > 1:
            pytest.skip("splitK hasn't been fully tested on AMD GPU.")

+    elif is_xpu():


Local result: 980 passed, 2340 skipped.

New results: 8 failed, 1124 passed, 2188 skipped in 1379.15s (0:22:59) (8 cases added to the skiplists)

anmyachev · 2025-09-08T17:35:22Z

python/triton_kernels/tests/test_matmul.py

+    elif is_xpu():
+        if split_k > 1:
+            pytest.skip("splitK hasn't been fully tested on INTEL GPU.")
+        if "float8_e4m3fn" in act_dtype_str and "float8_e4m3fn" in weight_dtype_str:


For these cases I see the following:

python/triton_kernels/tests/test_matmul.py::test_op[False-True-True-True-False-128-1000-400-400-ragged-float8_e4m3fn-float8_e4m3fn-3-1-1-1-False-None] - AssertionError: ref_y_scale: 0.004773152060806751, tri_y_scale: 0.005022321827709675

anmyachev · 2025-09-08T17:36:06Z

third_party/intel/backend/compiler.py

    generate_native_code: bool = False
    advanced_path: bool = False
    enable_tile_load_linear_layout: bool = True
+    arch: str = None


Otherwise, new tests don't work.

anmyachev · 2025-09-08T17:39:07Z

python/triton_kernels/triton_kernels/matmul_ogs_details/opt_flags.py

+    group_m = 8
+    xcd_swizzle = 1


Not sure about it

etiotto · 2025-09-09T15:01:36Z

python/triton_kernels/tests/test_matmul.py

+        if split_k > 1:
+            pytest.skip("splitK hasn't been fully tested on INTEL GPU.")
+        if "float8_e4m3fn" in act_dtype_str and "float8_e4m3fn" in weight_dtype_str:
+            pytest.skip("FIXME")


Suggest to create an issue for this and mark te skip with the issue number.

Signed-off-by: Anatoly Myachev <[email protected]>

YangKai0616 · 2025-09-10T02:21:21Z

python/triton_kernels/triton_kernels/matmul_ogs_details/opt_flags_details/opt_flags_intel.py

+
+def compute_split_k(block_k: int, k: int | None, grid_size: int) -> int:
+    device_props = torch.xpu.get_device_properties(0)
+    n_sms = device_props.multi_processor_countgpu_subslice_count


You mean gpu_subslice_count?

Should be gpu_subslice_count.

good catch! you're right.

python/triton_kernels/triton_kernels/matmul_ogs_details/opt_flags_details/opt_flags_intel.py

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev · 2025-09-10T13:13:47Z

scripts/test-triton.sh


  TRITON_TEST_SUITE=triton_kernels \
-    run_pytest_command -vvv -n ${PYTEST_MAX_PROCESSES:-8} --device xpu .
+    run_pytest_command -vvv -n ${PYTEST_MAX_PROCESSES:-4} --device xpu .


Otherwise, Python processes start to break.

anmyachev linked an issue Sep 5, 2025 that may be closed by this pull request

[feature] Lack of XPU support for make_opt_flags function #4975

Closed

anmyachev force-pushed the amyachev/issue4975 branch from bd9c534 to 5646fa0 Compare September 8, 2025 12:01

anmyachev requested review from Egor-Krivov, chengjunlu, etiotto and whitneywhtsang September 8, 2025 12:02

anmyachev mentioned this pull request Sep 8, 2025

Enable python/triton_kernels/tests as optional in test-triton.sh #4924

Merged

chengjunlu approved these changes Sep 8, 2025

View reviewed changes

anmyachev marked this pull request as ready for review September 8, 2025 16:47

anmyachev mentioned this pull request Sep 8, 2025

[feature] Lack of XPU support for make_opt_flags function #4975

Closed

anmyachev commented Sep 8, 2025

View reviewed changes

YangKai0616 mentioned this pull request Sep 8, 2025

[Triton XPU] Add support for XPU to triton implementation openai/gpt-oss#177

Open

etiotto reviewed Sep 9, 2025

View reviewed changes

anmyachev added 3 commits September 9, 2025 19:06

Implement 'make_opt_flags' function for XPU

24023a3

Signed-off-by: Anatoly Myachev <[email protected]>

simplify heuristics; remove cuda code

97e7ed7

Signed-off-by: Anatoly Myachev <[email protected]>

fixes

4a6f708

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev mentioned this pull request Sep 9, 2025

Some python/triton_kernels/tests/test_matmul.py::test_op test cases don't work #5074

Open

anmyachev force-pushed the amyachev/issue4975 branch from 40fd5c2 to 2f9b449 Compare September 9, 2025 19:42

update siplists and code after rebase

d129992

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev force-pushed the amyachev/issue4975 branch from 2f9b449 to d129992 Compare September 9, 2025 19:45

YangKai0616 reviewed Sep 10, 2025

View reviewed changes

anmyachev commented Sep 10, 2025

View reviewed changes

python/triton_kernels/triton_kernels/matmul_ogs_details/opt_flags_details/opt_flags_intel.py Outdated Show resolved Hide resolved

anmyachev and others added 3 commits September 10, 2025 10:14

gpu_subslice_count

dbfe211

enable 'test_fused_act' and 'test_zero_reduction_dim'

9c281fa

Signed-off-by: Anatoly Myachev <[email protected]>

reduce parallelism down to 4 processes for 'triton_kernels' test suite

2373697

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev commented Sep 10, 2025

View reviewed changes

anmyachev changed the title ~~Implement make_opt_flags function for XPU~~ Implement make_opt_flags function for XPU and enable tests in test_matmul.py Sep 10, 2025

anmyachev merged commit 632d234 into main Sep 10, 2025
19 checks passed

anmyachev deleted the amyachev/issue4975 branch September 10, 2025 13:15

Implement make_opt_flags function for XPU and enable tests in test_matmul.py #5051

Implement make_opt_flags function for XPU and enable tests in test_matmul.py #5051

Uh oh!

Conversation

anmyachev commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anmyachev commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implement `make_opt_flags` function for XPU and enable tests in `test_matmul.py` #5051

Implement `make_opt_flags` function for XPU and enable tests in `test_matmul.py` #5051

anmyachev commented Sep 5, 2025 •

edited

Loading