Skip to content

Per-Message Deflate Memory Leak #1617

@yosiat

Description

@yosiat

sorry for not following the issue template, this way is easier to explain the issue, I will be happy to elaborate more

Description

In a production server after running a long time / mass-disconnection, I have noticed memory leak according to zlib & websockets.

This happens to all servers, but extremely happened to one of our servers which got mass-disconnect from 2.5k connected users to 50 connceted users. memory dump on this server showed:

  • We have 87k instances of WebSocket class
  • AsyncLimiter Queue have thousands of items

I managed to reproduce locally via:

  • socket.io server (this my setup) with per-message deflate enabled
  • connected with 200 users simultaneously and disconnected at once - memory stays the same, did manual garbage collection, wait for 2 for minutes, taking memory dump shows server still have WebSocket classes in memory.
  • Connected again with 20 users and now I can't connect - not getting responses from the server. the server is stuck.

Root cause

Because of AsyncLimiter have 10 pending jobs that never finished, this happens as a result of a race - calling close on inflate/deflate won't close any further flushes.

Example:

inflate.write(...);
inflate.close(); // happened async
inflate.flush(() => /* won't be called */);

I made some patches in the code to make sure I call the callback, but I didn't like the solution of attaching a callback to inflate/deflate objects, since the object can handle multiple operations at the same time and we can have further racers.

In my opinion, we should decouple the inflate/deflate from per-message deflate and create a class which will have an internal queue and call to inflate/deflate, one cleanup it will flush the queue - this way we can the solve the issues I wrote above.

I can submit a pull request for the above approach.

Reproducible in:

  • ws version: 6.4.1
  • socket.io version: 2.2.0
  • Node.js version(s): v10.15.3
  • OS version(s): Mac (local) / Linux (production)

Memory dump screenshot

Screen Shot 2019-08-11 at 16 12 25

The picture above is a memory dump for the server which had mass-disconnect from 2.5k to 50 sockets:

  • 88,992 instance of PerMessageDeflate
  • 178,012 instances of WebSocket
  • Array / Queue - is AsyncLimiter queue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions