-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Keep-alives and connect_timeout bug #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ync connections would close due to connection timeout even after the connection had been established
…nore all the keepalive logic
|
I tried merging this and all the unit tests failed. Please make sure that the tests pass (run "python -m tornado.test.runtests"), and also add some new tests to ensure that connections are getting reused. |
|
Jawesome |
|
I just pulled all the new stuff and everything seems to be working (passed all the tests...) Could you possibly share what tests were failing and how? Going to add in new tests before I commit. Thanks for the feedback. |
|
Here's the log from the tests: |
|
Perfect, thanks |
|
How’s progress, NickNeedsAName? This looks like a great feature and you’ve clearly put a lot of effort in. Would be a shame to see this die. |
|
Hah! Hey. I haven't touched it in a loonnggg time, but I'll give it a look over. On Sat, Jan 14, 2012 at 5:09 AM, Ben Hodgson <
|
|
Just an update -- I pulled the most recent code and now the tests pass? O.o |
|
So, I figured out what the issue with the redirects was, _on_body I was immediately re-allocating the stream if we were keeping things alive, now waiting to that until after the redirect logic. All the tests are passing, but what's the best way to test that keep-alives are successfully implemented? I'm currently logging the time the requests take and stuff, but that's somewhat spotty and not always accurate? I'm also logging if we're using a new stream or the same stream, but then I'd have to add a whole bunch of other logic to pass back whether or not the client is using a new stream? Proobabbblllly not the best way. Here's the output I'm getting (pinging the '/hello' url in the test cases) FIRST Which i'd say is pretty satisfactory for proving that keep-alives are actually working properly, but again, what is there to be actually testing against? |
|
I decided to extend the existing code in hope of adding the required pieces needed in order to reach a release candidate of this patch. All the available tests in the existing suite are executed successfully. However, I do not consider this to be mature enough yet. There are some things missing which I feel are best discussed with those willing to participate before proceeding with the patch. On top of the existing patch I have:
What I am considering now and which I hope to receive feedback on is the best way of testing this functionality. I would need to verify that the connection count is the same on the client as the server where the count differs depending on the amount of requests and whether keep-alive is utilized or not. Currently, there is no way to easily retrieve connection count from either the server or the client to my knowledge, but it could be added. Otherwise, I could check this against the IOLoop directly. Any thoughts on this? The second thing is regarding the close callback on client connections. Currently, this is set to None in case keep-alive is utilized which I am unsure whether it is a good idea or not. In case of keep-alive I would suggest it be moved to the termination of the entire queue and otherwise remain as is, i.e per termination of a single stream. Also general feedback on the extension of this patch is more than welcome. |
|
Haven't read the patch in detail yet, but in the SPDY fork, I've added a |
|
Excited that other people see this as worthwhile!!!! I also am at a loss regarding (have been at a loss, but honestly haven't put much time into thinking about) how to test that this is doing what it claims to do. Your suggestions make sense. Another interesting thing is that this actually has the potential for massive memory problems: Assume that there exists a sort of steady-state between the client and a server where the is using 5 connections constantly to communicate with the server. Then something happens to the server and the new steady state is 1000 connections (remote server becomes slow/sluggish/unresponsive). Then things go back to normal, there are still 1000 connections in the Queue and they'll never be closed because you'll probably cycle through all 1000 over the course of a few seconds, causing them to remain open...etc.... This is something I'm currently thinking about. |
|
@alekstorm: I can definitely add support for I hope to be able to spend a day on this during the week-end to - hopefully - close this pull request. I will focus on writing an exhaustive suite of tests covering this behavior. I have also thought about those issues that you are mentioning @NickNeedsAName. I believe the best solution is to improve the handling of the scenario when existing streams are performing I/O. Currently, we then initialize a new connection/stream entirely and add it to the stream queue. We could wait instead by queuing the request once more and have a timeout for how long this behavior is allowed before we raise an exception. Along with some sanity precautions which will ensure only a few attempts/connections are setup against each endpoint (scheme, host, port). But there is definitely work to be done here in order to get this running smoothly. Hopefully, I can reduce the list of things needed this week-end so that we can reach a solution to this quickly. |
|
@birknilson what I'm currently implementing//trying is instead of just having a FIFO queue to use a LIFO queue. Then on every request we can check to see if this stream_key has a garbage collecting timeout set, if it does do nothing, if it doesn't, put one in. Every time we use a stream I'm going to update it's "last_used" time, so on garbage collection we basically pull everything out of the Queue, find all the things that haven't been used in a while, and put it all back in the right order. Since tornado is single threaded there shouldn't be issues with trying to pull things from the queue that aren't there etc, and anything that's not in the Queue because it's in use...well...that's also fine because it's in use! The LIFO queue fixes the steady state issue from 5->1000->5 again because then only the front 5 in the Queue will be used. |
|
@NickNeedsAName Ok great. Since you are already working on a patch for that behavior I could focus entirely on writing the necessary tests for this pull request. Along with integrating the
I can fork your repository and submit my existing changes as a pull request to it and do the same later with the tests & minor changes mentioned above. |
|
Since I can't add comments inline outside of a pull request or specific commit, I've collected some here:
I'll look at the stream logic more closely tomorrow. |
|
Are there any plans to continue this effort? |
|
There's a newer effort in #1457 |
|
I'm not currently working on this, so def go with #1457 On Saturday, November 28, 2015, Ben Darnell [email protected]
|
|
Closing due to age. The latest attempt at this problem is #1622. |
Added support for keep-alive in SimpleAsyncClient, also fixed bug where Async connections would close due to connection timeout even after the connection had been established
Decided to not go with the suggested implementation of keeping a pool of _HTTPConnections as it seemed cumbersome to maintain all the state of an object that would (potentially) be overwritten every time. Decided instead on keeping a queue of streams (essentially sockets) that are reused as soon as they become available.
Streams are keyed in the stream_map with a (scheme,host,port) tuple (or some permutation, i forget)
Client defaults to keep_alive, dead sockets are cycled through if they're dead and not used, when a stream is no longer in use it drops references to the current _HTTPConnection and readies itself for the next one.
Any suggestions/problems/fixes let me know.