-
Notifications
You must be signed in to change notification settings - Fork 21.5k
node: fix goroutine leak by closing accounts manager on Node.Stop #18505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Removing |
node/node.go
Outdated
|
|
||
| if n.accman != nil { | ||
| if err := n.accman.Close(); err != nil { | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really correct to return here? Wouldn't it be better to continue closing remaining things, and then return the (potentially aggregated) error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. It is better to return the error at the end.
|
|
||
| if n.accman == nil { | ||
| var err error | ||
| n.accman, n.ephemeralKeystore, err = makeAccountManager(n.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, with this change, the account manager is (potentially) started from two different locations, right? Isn't it possible to remove the other location, and only use this one to make things less complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it is started in constructor and in Start functions. But this comment suggests that we have to have accounts manager before start
Line 100 in 3b73e3d
| // Ensure that the AccountManager method works before the node has started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node was previously made to be started once, stopped and discarded. I think it is in general a lot cleaner to recreate the instance instead of restarting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, at a second glance, it does handle restarts. Lemme think about this.
| close(n.stop) | ||
|
|
||
| // Close the accounts manager. | ||
| var accmanError error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to accmanErr to remain consistent with Go's naming conventions.
|
In general I don't like this solution. The current invariant is that when you have a node, you have a valid account manager. This PR breaks it by making the account manager unavailable between Stop->Start calls. Perhaps a cleaner solution would be to retain Start()/Stop() as they are now, but also add a Close() method that tears down the account manager (and calls Stop just in case). Then calling code can restarts the node many times without the goroutine spam, and when it's done, it can just close and clean everything up. |
|
Thanks @karalabe. I will create a different PR with much better and cleaner implementation that you suggested. This PR was not addressed for a while and I have to see why I did not go with Node.Close. Probably because p2p/simulations/adapters.Node does not have Close function and is not handling it. Also it requires to move the ephemeralKeystore cleanup to the new Close function, which may introduce other problems as well. Maybe the intention was not to close the account manager at all? |
Fixes: ethersphere/swarm#1142.
Swarm PR ethersphere/swarm#1147.
While performing the test described in ethersphere/swarm#1142 issue, a large number of gorutines were created and not terminated. Stack trace shows many goroutines:
Accounts manager is created but not closed for node.Node, leaving large number of goroutines alive. This change closes accounts manager on Node.Stop, but Node initializes accounts manager in constructor, not no Start, so for Restart and Start to work as expected, accounts manager construction and closing is handled in Start as well, preserving the original required initialization in constructor.
Keystore watch creates fs watchers and we have noticed before a lot of
(FSEvents.framework) FSEventStreamStart: register_with_server: ERROR: f2d_register_rpc() => (null) (-21)errors on macOS while running a large scale simulations. This change may improve simulations performance as it closes unneeded file watchers.