Skip to content

Conversation

@anmyachev
Copy link
Contributor

@anmyachev anmyachev commented Sep 5, 2025

Note for reviewers: I've left only the most basic heuristics. If you have improvements in mind that will definitely work better without testing, i.e. were already known, then we can make edits directly to this pull request. If you have improvements that need to be tested - that's also good, please also write, but I'd prefer to implement the basic version as quickly as possible and tune it in separate PRs if possible.

Pass rate: 84.11% -> 89.04%

@anmyachev
Copy link
Contributor Author

Looks like I can test these changes using python/triton_kernels/tests/test_matmul.py

@anmyachev anmyachev marked this pull request as ready for review September 8, 2025 16:47
if split_k > 1:
pytest.skip("splitK hasn't been fully tested on AMD GPU.")

elif is_xpu():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local result: 980 passed, 2340 skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New results: 8 failed, 1124 passed, 2188 skipped in 1379.15s (0:22:59) (8 cases added to the skiplists)

elif is_xpu():
if split_k > 1:
pytest.skip("splitK hasn't been fully tested on INTEL GPU.")
if "float8_e4m3fn" in act_dtype_str and "float8_e4m3fn" in weight_dtype_str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these cases I see the following:

python/triton_kernels/tests/test_matmul.py::test_op[False-True-True-True-False-128-1000-400-400-ragged-float8_e4m3fn-float8_e4m3fn-3-1-1-1-False-None] - AssertionError: ref_y_scale: 0.004773152060806751, tri_y_scale: 0.005022321827709675

generate_native_code: bool = False
advanced_path: bool = False
enable_tile_load_linear_layout: bool = True
arch: str = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, new tests don't work.

Comment on lines +58 to +59
group_m = 8
xcd_swizzle = 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about it

if split_k > 1:
pytest.skip("splitK hasn't been fully tested on INTEL GPU.")
if "float8_e4m3fn" in act_dtype_str and "float8_e4m3fn" in weight_dtype_str:
pytest.skip("FIXME")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to create an issue for this and mark te skip with the issue number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Anatoly Myachev <[email protected]>

def compute_split_k(block_k: int, k: int | None, grid_size: int) -> int:
device_props = torch.xpu.get_device_properties(0)
n_sms = device_props.multi_processor_countgpu_subslice_count

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean gpu_subslice_count?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be gpu_subslice_count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! you're right.


TRITON_TEST_SUITE=triton_kernels \
run_pytest_command -vvv -n ${PYTEST_MAX_PROCESSES:-8} --device xpu .
run_pytest_command -vvv -n ${PYTEST_MAX_PROCESSES:-4} --device xpu .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, Python processes start to break.

@anmyachev anmyachev changed the title Implement make_opt_flags function for XPU Implement make_opt_flags function for XPU and enable tests in test_matmul.py Sep 10, 2025
@anmyachev anmyachev merged commit 632d234 into main Sep 10, 2025
19 checks passed
@anmyachev anmyachev deleted the amyachev/issue4975 branch September 10, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Lack of XPU support for make_opt_flags function

5 participants