Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions synapse/http/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@
from synapse.util import Clock, json_encoder
from synapse.util.caches import intern_dict
from synapse.util.cancellation import is_function_cancellable
from synapse.util.iterutils import chunk_seq

Check failure on line 83 in synapse/http/server.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (F401)

synapse/http/server.py:83:36: F401 `synapse.util.iterutils.chunk_seq` imported but unused

if TYPE_CHECKING:
import opentracing
Expand Down Expand Up @@ -909,6 +909,9 @@
large response bodies.
"""

Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What timeout is this comment referring to?

# The problem with dumping all of the response into the `Request` object at
# once (via `Request.write`) is that doing so starts the timeout for the
# next request to be received: so if it takes longer than 60s to stream back
# the response to the client, the client never gets it.

cc @richvdh where this idea originally came from in matrix-org/synapse#3693 / matrix-org/synapse#3701
cc @erikjohnston for introducing this comment in the codebase matrix-org/synapse#10844 / matrix-org/synapse#10905

(see PR description for more context)

As far as I can tell, we don't timeout at all on the Synapse level (and I don't think Twisted does either).

Is this some sort of reverse-proxy layer timeout interaction we're talking about here?

For example with nginx:

proxy_connect_timeout       60;
proxy_send_timeout          60;
proxy_read_timeout          60;
send_timeout                60;

Or HAProxy:

# wait for 5s when connecting to a server
timeout connect 5s

# ... but if there is a backlog of requests, wait for 60s before returning a 500
timeout queue 60s

# close client connections 5m after the last request
# (as recommened by https://support.cloudflare.com/hc/en-us/articles/212794707-General-Best-Practices-for-Load-Balancing-with-CloudFlare)
timeout client 900s

# give clients 5m between requests (otherwise it defaults to the value of 'timeout http-request')
timeout http-keep-alive 900s

# give clients 10s to complete a request (either time between handshake and first request, or time spent sending headers)
timeout http-request 10s

# time out server responses after 180s
timeout server 180s

Copy link
Member

@richvdh richvdh Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the request timeout within twisted. Twisted docs:

The initial value of timeOut, which defines the idle connection timeout in seconds, or None to disable the idle timeout.

That is then passed into HTTPChannel, which starts and stops the timeout at appropriate times.

Looking again at what we had before matrix-org/synapse#3701, which was effectively:

    request.write(json_bytes)
    request.finish()

request.write passes its data down into ITransport.write, which says

No data will ever be lost, although (obviously) the connection may be closed before it all gets through.

In other words, if json_bytes is large, it may get buffered, in memory, within the transport.

request.finish() then re-starts Twisted's idle connection timer for the HTTPChannel. So, it's entirely possible that the idle timer elapses before we have finished flushing out all the response bytes. Hence, the problem that PR set out to fix.

Of course, it's possible this has changed in the intervening 7 years, but I don't think so.

(And yes, I probably should have reported it to Twisted at the time. My bad.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @richvdh 🙏

I was able to make a reproduction case for this problem and created a bug report in the Twisted repo -> twisted/twisted#12498

request.write(bytes_to_write)
request.finish()

# The problem with dumping all of the response into the `Request` object at
# once (via `Request.write`) is that doing so starts the timeout for the
# next request to be received: so if it takes longer than 60s to stream back
Expand All @@ -919,12 +922,12 @@

# To make sure we don't write all of the bytes at once we split it up into
# chunks.
chunk_size = 4096
bytes_generator = chunk_seq(bytes_to_write, chunk_size)
# chunk_size = 4096
# bytes_generator = chunk_seq(bytes_to_write, chunk_size)

# We use a `_ByteProducer` here rather than `NoRangeStaticProducer` as the
# unit tests can't cope with being given a pull producer.
_ByteProducer(request, bytes_generator)
# # We use a `_ByteProducer` here rather than `NoRangeStaticProducer` as the
# # unit tests can't cope with being given a pull producer.
# _ByteProducer(request, bytes_generator)


def set_cors_headers(request: "SynapseRequest") -> None:
Expand Down
Loading