-
Notifications
You must be signed in to change notification settings - Fork 74
[TEST_DEBUG] Enable test device assert #5395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There is no updates from pytorch/pytorch#142135 for a while, do you know what's the ETA for that? |
To my knowledge, sometime next year. I'll send you an internal ticket for that. |
|
|
||
| if should_fail: | ||
| abort_or_runtime_error = ( | ||
| result.returncode == 1 or # RuntimeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases do we get a RuntimeError? At first glance, I would expect all errors of SIGABRT type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is needed for other devices, then you can leave the old implementation for devices that are not XPU: if not is_xpu() and add a separate branch for us with what you wrote, perhaps this can simplify the code. This is optional because I don't know if it will result in less code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe easier to resolve merge conflicts
| mask_str = "None" if mask is None else str(mask) | ||
| opt_flag_str = "None" if opt_flag is None else str(opt_flag) | ||
|
|
||
| result = subprocess.run([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set environment variables directly when calling subprocess.run, for example for TRITON_DEBUG. This way you can pass fewer parameters into test_debug_kernels.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I only have minor comments.
Comes from #2755.
This PR is a temporary fix leveraging subprocess for catching the abort signal, enabling test_device_assert until pytorch/pytorch#142135 is resolved.