-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-1934: withTransaction API retries too frequently #1851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
sleepyStick
wants to merge
8
commits into
master
Choose a base branch
from
DRIVERS-1934
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 1 commit
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
25759c4
add exponential backoff to withTransactions API
sleepyStick bdcd2ef
run pre-commit
sleepyStick 48890a2
add design rational for backoff
sleepyStick 71ba1ba
fix prose test
sleepyStick b602606
fix pseudocode
sleepyStick 057fbbf
fix test
sleepyStick 42e4d94
account for CSOT / timeoutMS in algorithm
sleepyStick c11aef8
add more details to tests
sleepyStick File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,7 +99,8 @@ has not been exceeded, the driver MUST retry a transaction that fails with an er | |
| "TransientTransactionError" label. Since retrying the entire transaction will entail invoking the callback again, | ||
| drivers MUST document that the callback may be invoked multiple times (i.e. one additional time per retry attempt) and | ||
| MUST document the risk of side effects from using a non-idempotent callback. If the retry timeout has been exceeded, | ||
| drivers MUST NOT retry the transaction and allow `withTransaction` to propagate the error to its caller. | ||
| drivers MUST NOT retry the transaction and allow `withTransaction` to propagate the error to its caller. When retrying, | ||
| drivers MUST implement an exponential backoff with jitter following the algorithm described below. | ||
|
|
||
| If an error bearing neither the UnknownTransactionCommitResult nor the TransientTransactionError label is encountered at | ||
| any point, the driver MUST NOT retry and MUST allow `withTransaction` to propagate the error to its caller. | ||
|
|
@@ -129,7 +130,13 @@ This method should perform the following sequence of actions: | |
| 1. If the ClientSession is in the "starting transaction" or "transaction in progress" state, invoke | ||
| [abortTransaction](../transactions/transactions.md#aborttransaction) on the session. | ||
| 2. If the callback's error includes a "TransientTransactionError" label and the elapsed time of `withTransaction` is | ||
| less than 120 seconds, jump back to step two. | ||
| less than 120 seconds, sleep for `jitter * min(BACKOFF_INITIAL * (1.25**retry), BACKOFF_MAX)` where: | ||
| 1. jitter is a random float between [0, 1) | ||
| 2. retry is one less than the number of times Step 2 has been executed since Step 1 was executed | ||
| 3. BACKOFF_INITIAL is 1ms | ||
| 4. BACKOFF_MAX is 500ms | ||
|
|
||
| Then, jump back to step two. | ||
| 3. If the callback's error includes a "UnknownTransactionCommitResult" label, the callback must have manually | ||
| committed a transaction, propagate the callback's error to the caller of `withTransaction` and return | ||
| immediately. | ||
|
|
@@ -154,11 +161,18 @@ This method should perform the following sequence of actions: | |
| This method can be expressed by the following pseudo-code: | ||
|
|
||
| ```typescript | ||
| var BACKOFF_INITIAL = 1 // 1ms initial backoff | ||
| var BACKOFF_MAX = 500 // 500ms max backoff | ||
| withTransaction(callback, options) { | ||
| // Note: drivers SHOULD use a monotonic clock to determine elapsed time | ||
| var startTime = Date.now(); // milliseconds since Unix epoch | ||
| var retry = 0 | ||
|
|
||
| retryTransaction: while (true) { | ||
| if (retry > 0): | ||
| sleep(Math.random() * min(BACKOFF_INITIAL * (1.25**retry), | ||
| BACKOFF_MAX)) | ||
| retry += 1 | ||
| this.startTransaction(options); // may throw on error | ||
|
|
||
| try { | ||
|
|
@@ -324,7 +338,7 @@ exceed the user's original intention for `maxTimeMS`. | |
| The callback may be executed any number of times. Drivers are free to encourage their users to design idempotent | ||
| callbacks. | ||
|
|
||
| A previous design had no limits for retrying commits or entire transactions. The callback is always able indicate that | ||
| A previous design had no limits for retrying commits or entire transactions. The callback is always able to indicate that | ||
|
||
| `withTransaction` should return to its caller (without future retry attempts) by aborting the transaction directly; | ||
| however, that puts the onus on avoiding very long (or infinite) retry loops on the application. We expect the most | ||
| common cause of retry loops will be due to TransientTransactionErrors caused by write conflicts, as those can occur | ||
|
|
@@ -356,6 +370,7 @@ provides an implementation of a technique already described in the MongoDB 4.0 d | |
| ([DRIVERS-488](https://jira.mongodb.org/browse/DRIVERS-488)). | ||
|
|
||
| ## Changelog | ||
| - 2025-10-17: withTransaction applies exponential backoff when retrying. | ||
|
|
||
| - 2024-09-06: Migrated from reStructuredText to Markdown. | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know python uses
**as the exponential operator. I don't believe that is a standard across other programming languages. Is it clear that its exponent? Is there a different preferred symbol for exponent? (I know math commonly uses^but it typically also means bitwise XOR in code so I felt like that could be confusing.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**seems fine to me, but I might also be biased because that's what Node does 🙂. But I don't think people will have trouble understanding what this means.