-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[BACKEND] Add barrier after assert op to avoid race condition #5035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
After this commit (and @peterbell10's #5037, which turns debug=True back on for the test_side_effectful_reduction test), test_side_effectful_reduction and test_side_effectful_scan are hanging for me |
hmm I wonder if we end up with barrier in control flow? @peterbell10 would this op be lowered within control flow? |
|
ah, it makes sense that they are as we are effectively doing a sync inside an if statement lol |
yes, #4811 introduces predicated asserts |
| // an op that may trap if the assert condition is true. Since the tensor in | ||
| // those two operations may have different layout we need to make sure all | ||
| // the threads are done executing the assert before going to the next op. | ||
| barrier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we only insert the barrier for RankedTensor conditionals. That still fixes the issue, avoids an unnecessary barrier for scalar conds and fixes the hang with reductions all in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah sounds like an easy enough solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…-lang#5035) Add a barrier to avoid a race condition in case an assert is followed by an op that may trap if the assert condition is true. Since the tensor in those two operations may have different layout we need to make sure all the threads are done executing the assert before going to the next op.
…-lang#5035) Add a barrier to avoid a race condition in case an assert is followed by an op that may trap if the assert condition is true. Since the tensor in those two operations may have different layout we need to make sure all the threads are done executing the assert before going to the next op.
Add a barrier to avoid a race condition in case an assert is followed by an op that may trap if the assert condition is true. Since the tensor in those two operations may have different layout we need to make sure all the threads are done executing the assert before going to the next op.