DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers #1852

blink1073 · 2025-10-28T13:02:03Z

Please complete the following before merging:

Is the relevant DRIVERS ticket in the PR title?

Update changelog.
Test changes in at least one language driver. Testing with PYTHON-5517 Updates to connection pool backoff behavior and tests mongo-python-driver#2598
Test these changes against all server versions and topologies (including standalone, replica set, and sharded
clusters).

…tion rate limiter triggers

…RS-3218

blink1073 · 2025-10-28T16:18:11Z

I'll get all of the tests passing in mongodb/mongo-python-driver#2598 and then include them in this PR.

ShaneHarvey · 2025-10-29T17:14:11Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+    - A successful heartbeat does NOT change the state of the pool.
+    - A failed heartbeat clears the pool.
+    - A subsequent failed connection will increase the backoff attempt.
+    - A successful connection will return the Pool to ready state.


Could we add a description of the exponential backoff + jitter for the backoff duration?

Should we define the backoff and jitter policy in one place and link to it? If so, should I add it in this PR and where?

I think it's small and simple enough that it should be defined here alongside where it will be used.

source/server-discovery-and-monitoring/server-monitoring.md

baileympearson · 2025-10-30T18:25:59Z

source/connection-monitoring-and-pooling/tests/cmap-format/pool-create-min-size-error.yml

@@ -1,40 +1,38 @@
 version: 1
 style: integration


We can't modify spec tests like this anymore, because that will break for drivers using submodules to track spec tests who haven't implemented backoff yet, and if those drivers then skip these tests they lose coverage.

baileympearson · 2025-10-30T20:43:21Z

source/server-discovery-and-monitoring/tests/unified/backoff-heartbeat-failure.yml

+          - poolBackoffEvent: {}
+          - poolBackoffEvent: {}
+          - poolBackoffEvent: {}
+          - poolBackoffEvent: {}
+          - poolBackoffEvent: {}
+          - poolBackoffEvent: {}


For this, and the tests below: where are all these pool backoff events coming from? I get zero / one (depending on my implementation, see below) of them in Node.

This is related to my comment in the design. Even if I align my implementation with the design, I only get one backoff event because there are no establishment attempts:

there are no requests in the wait queue

minPoolSize is 0, so no background thread population

The issue is these extra backoff attempts occur because of the new retry logic, which is already implemented in the branch Steve is using to test these changes in python. Is there a way we can test this without tying the two projects together? Otherwise drivers will need to implement the retry logic first.

How does the retry logic come into play here? I thought we only retried commands with a SystemOverloadError error label. And the insert does have an expectError - so the insert fails on Steve's branch as well.

Similar to how the driver labels connection errors with "RetryableWriteError" for retryable writes, we add the "RetryableError" and "SystemOverloadedError" labels to these errors so that the command can later be retried. We'll need to clarify this rule in this PR.

baileympearson · 2025-10-30T20:48:02Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+    connection checkout fails under conditions that indicate server overload. The rules for entering backoff mode are as
+    follows: - A network error or network timeout during the TCP handshake or the `hello` message for a new connection
+    MUST trigger the backoff state. - Other pending connections MUST not be canceled. - In the case of multiple pending
+    connections, the backoff attempt number MUST only be incremented once. This can be done by recording the state prior


I was talking with @ShaneHarvey about this (related comments on drivers ticket). Shane's understanding is that we decided to include all timeout errors, regardless of where it originated, during connection establishment. Does that match your understanding, Steve?

And related; the design says:

After a connection establishment failure the pool enters the PoolBackoff state.

We should update the design with whatever the outcome of this thread is.

blink1073 added 2 commits October 28, 2025 07:58

DRIVERS-318 Avoid clearing the connection pool when the server connec…

4e6bfeb

…tion rate limiter triggers

update changelogs

d830b12

blink1073 changed the title ~~DRIVERS-318 Avoid clearing the connection pool when the server connection rate limiter triggers~~ DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers Oct 28, 2025

blink1073 added 4 commits October 28, 2025 10:11

address todo

c6c8626

update unified format and cmap-format

70de630

Merge branch 'master' of github.com:mongodb/specifications into DRIVE…

51196d8

…RS-3218

update schemas

3d27dd4

blink1073 mentioned this pull request Oct 28, 2025

PYTHON-5517 Updates to connection pool backoff behavior and tests mongodb/mongo-python-driver#2598

Open

21 tasks

ShaneHarvey reviewed Oct 29, 2025

View reviewed changes

blink1073 added 8 commits October 29, 2025 12:40

address review

b72e044

clarify backoff logic

20cc961

add tests

dc6797b

fix test

e72968a

formatting

91cd80f

update schema version

b0561bd

formatting

5b37566

update tests

a56e7ec

baileympearson requested changes Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers #1852

DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers #1852

blink1073 commented Oct 28, 2025

Uh oh!

blink1073 commented Oct 28, 2025

Uh oh!

ShaneHarvey Oct 29, 2025

Uh oh!

blink1073 Oct 29, 2025

Uh oh!

ShaneHarvey Oct 30, 2025

Uh oh!

Uh oh!

baileympearson Oct 30, 2025

Uh oh!

baileympearson Oct 30, 2025

Uh oh!

ShaneHarvey Oct 30, 2025

Uh oh!

baileympearson Oct 30, 2025

Uh oh!

ShaneHarvey Oct 30, 2025 •

edited

Loading

Uh oh!

baileympearson Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers #1852

Are you sure you want to change the base?

DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers #1852

Conversation

blink1073 commented Oct 28, 2025

Uh oh!

blink1073 commented Oct 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShaneHarvey Oct 30, 2025 •

edited

Loading