Skip to content

Conversation

@maxdml
Copy link
Collaborator

@maxdml maxdml commented Oct 28, 2025

Do not ignore conflicts. When conflicts are detected (i.e., another process is attempting to record the outcome of the step after the first recording happened), the logic will detect it and return an conflict error.

This helps the end user detect when unexpected concurrent executions take place.

This PR also fixes the logic we use to detect concurrent executions when executing a workflow function. Specifically:

  • newStepExecutionError now wraps the underlying error (it was losing it).
  • Implement a custom errors.Is function to compare DBOSError instances based on their underlying code.
  • Replace the errors.As check by an error.Is. As looks up for the first instance of a matching error, whereas Is iterates through the error tree until its condition is satisfied.

Note that the conflict detection is already exercised in the TestWorkflowRecovery, which recovers a workflow blocked in a step. Both original or recovery execution race to record the outcome of the step, and the loser falls back to polling the database for the workflow result.

Copy link
Member

@kraftp kraftp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for duplicate executions? Were they passing before? Why?

@maxdml
Copy link
Collaborator Author

maxdml commented Oct 28, 2025

Are there tests for duplicate executions? Were they passing before? Why?

TestWorkflowRecovery exercise the duplicate execution path. The test was passing before because of the ON CONFLICT DO NOTHING clause. I don't think we can check we correctly entered the polling fallback path, from the calling code, without modifying the code. Today it shows in the logs with a warning, e.g.:

time=2025-10-28T14:43:55.866-07:00 level=WARN msg="Workflow ID conflict detected. Waiting for existing workflow to complete" workflow_id=recovery-test-2

@kraftp
Copy link
Member

kraftp commented Oct 28, 2025

Are there tests for duplicate executions? Were they passing before? Why?

TestWorkflowRecovery exercise the duplicate execution path. The test was passing before because of the ON CONFLICT DO NOTHING clause. I don't think we can check we correctly entered the polling fallback path, from the calling code, without modifying the code. Today it shows in the logs with a warning, e.g.:

time=2025-10-28T14:43:55.866-07:00 level=WARN msg="Workflow ID conflict detected. Waiting for existing workflow to complete" workflow_id=recovery-test-2

You can easily test this by adding checks in the test workflow function (for example, verifying that anything in the workflow after the conflict is only called once).

@maxdml
Copy link
Collaborator Author

maxdml commented Oct 28, 2025

Are there tests for duplicate executions? Were they passing before? Why?

TestWorkflowRecovery exercise the duplicate execution path. The test was passing before because of the ON CONFLICT DO NOTHING clause. I don't think we can check we correctly entered the polling fallback path, from the calling code, without modifying the code. Today it shows in the logs with a warning, e.g.:
time=2025-10-28T14:43:55.866-07:00 level=WARN msg="Workflow ID conflict detected. Waiting for existing workflow to complete" workflow_id=recovery-test-2

You can easily test this by adding checks in the test workflow function (for example, verifying that anything in the workflow after the conflict is only called once).

yeah, done

@maxdml maxdml merged commit 31315f4 into main Oct 28, 2025
3 checks passed
@maxdml maxdml deleted the fix-step-recording-query branch October 28, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants