-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
sorry for not following the issue template, this way is easier to explain the issue, I will be happy to elaborate more
Description
In a production server after running a long time / mass-disconnection, I have noticed memory leak according to zlib & websockets.
This happens to all servers, but extremely happened to one of our servers which got mass-disconnect from 2.5k connected users to 50 connceted users. memory dump on this server showed:
- We have 87k instances of
WebSocketclass - AsyncLimiter Queue have thousands of items
I managed to reproduce locally via:
- socket.io server (this my setup) with per-message deflate enabled
- connected with 200 users simultaneously and disconnected at once - memory stays the same, did manual garbage collection, wait for 2 for minutes, taking memory dump shows server still have
WebSocketclasses in memory. - Connected again with 20 users and now I can't connect - not getting responses from the server. the server is stuck.
Root cause
Because of AsyncLimiter have 10 pending jobs that never finished, this happens as a result of a race - calling close on inflate/deflate won't close any further flushes.
Example:
inflate.write(...);
inflate.close(); // happened async
inflate.flush(() => /* won't be called */);I made some patches in the code to make sure I call the callback, but I didn't like the solution of attaching a callback to inflate/deflate objects, since the object can handle multiple operations at the same time and we can have further racers.
In my opinion, we should decouple the inflate/deflate from per-message deflate and create a class which will have an internal queue and call to inflate/deflate, one cleanup it will flush the queue - this way we can the solve the issues I wrote above.
I can submit a pull request for the above approach.
Reproducible in:
- ws version: 6.4.1
- socket.io version: 2.2.0
- Node.js version(s): v10.15.3
- OS version(s): Mac (local) / Linux (production)
Memory dump screenshot
The picture above is a memory dump for the server which had mass-disconnect from 2.5k to 50 sockets:
- 88,992 instance of PerMessageDeflate
- 178,012 instances of WebSocket
- Array / Queue - is AsyncLimiter queue
