Skip to content

Conversation

matt-aitken
Copy link
Member

If short runs were inside triggerAndWait or batchTriggerAndWait there was a race condition where a checkpoint wouldn't be created in time. In some cases the run wasn't running in the cluster anymore and they got stuck frozen forever.

Changes

  • Don't attempt to continue these runs in the cluster if there's no checkpoint.
  • When we create the checkpoint try and continue these runs (they won't continue if the sub-runs aren't finished).
  • Remove some code that was failing attempts to prevent infinite recursion. It was causing errors in certain conditions where runs would have otherwise succeeded.

Copy link

changeset-bot bot commented Aug 20, 2024

🦋 Changeset detected

Latest commit: 17994f0

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@matt-aitken matt-aitken merged commit 0591db5 into main Aug 20, 2024
2 checks passed
@matt-aitken matt-aitken deleted the fix-triggerandwait-races branch August 20, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant