Skip to content

Conversation

@janos
Copy link
Contributor

@janos janos commented Jan 23, 2019

Fixes: ethersphere/swarm#1142.

Swarm PR ethersphere/swarm#1147.

While performing the test described in ethersphere/swarm#1142 issue, a large number of gorutines were created and not terminated. Stack trace shows many goroutines:

goroutine 3370 [select]:
github.com/ethereum/go-ethereum/accounts.(*Manager).update(0xc0000b7520)
	/Users/janos/go/src/github.com/ethereum/go-ethereum/accounts/manager.go:95 +0x22a
created by github.com/ethereum/go-ethereum/accounts.NewManager
	/Users/janos/go/src/github.com/ethereum/go-ethereum/accounts/manager.go:68 +0x6f2

goroutine 4481 [select]:
github.com/ethereum/go-ethereum/accounts/keystore.(*watcher).loop(0xc0035ac9a0)
	/Users/janos/go/src/github.com/ethereum/go-ethereum/accounts/keystore/watch.go:94 +0x65b
created by github.com/ethereum/go-ethereum/accounts/keystore.(*watcher).start
	/Users/janos/go/src/github.com/ethereum/go-ethereum/accounts/keystore/watch.go:52 +0xb6

Accounts manager is created but not closed for node.Node, leaving large number of goroutines alive. This change closes accounts manager on Node.Stop, but Node initializes accounts manager in constructor, not no Start, so for Restart and Start to work as expected, accounts manager construction and closing is handled in Start as well, preserving the original required initialization in constructor.

Keystore watch creates fs watchers and we have noticed before a lot of (FSEvents.framework) FSEventStreamStart: register_with_server: ERROR: f2d_register_rpc() => (null) (-21) errors on macOS while running a large scale simulations. This change may improve simulations performance as it closes unneeded file watchers.

@nonsense
Copy link
Member

Removing swarm label, as this affects not only Swarm.

node/node.go Outdated

if n.accman != nil {
if err := n.accman.Close(); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really correct to return here? Wouldn't it be better to continue closing remaining things, and then return the (potentially aggregated) error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. It is better to return the error at the end.


if n.accman == nil {
var err error
n.accman, n.ephemeralKeystore, err = makeAccountManager(n.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with this change, the account manager is (potentially) started from two different locations, right? Isn't it possible to remove the other location, and only use this one to make things less complex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it is started in constructor and in Start functions. But this comment suggests that we have to have accounts manager before start

// Ensure that the AccountManager method works before the node has started.
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node was previously made to be started once, stopped and discarded. I think it is in general a lot cleaner to recreate the instance instead of restarting it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, at a second glance, it does handle restarts. Lemme think about this.

@frncmx
Copy link
Contributor

frncmx commented Jan 31, 2019

@fjl @karalabe would you please take a look on this as it touches your code parts?

close(n.stop)

// Close the accounts manager.
var accmanError error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to accmanErr to remain consistent with Go's naming conventions.

@karalabe
Copy link
Member

karalabe commented Feb 5, 2019

In general I don't like this solution. The current invariant is that when you have a node, you have a valid account manager. This PR breaks it by making the account manager unavailable between Stop->Start calls.

Perhaps a cleaner solution would be to retain Start()/Stop() as they are now, but also add a Close() method that tears down the account manager (and calls Stop just in case). Then calling code can restarts the node many times without the goroutine spam, and when it's done, it can just close and clean everything up.

@janos
Copy link
Contributor Author

janos commented Feb 5, 2019

Thanks @karalabe. I will create a different PR with much better and cleaner implementation that you suggested. This PR was not addressed for a while and I have to see why I did not go with Node.Close. Probably because p2p/simulations/adapters.Node does not have Close function and is not handling it. Also it requires to move the ephemeralKeystore cleanup to the new Close function, which may introduce other problems as well. Maybe the intention was not to close the account manager at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

network/simulation: Goroutine leak (simultaneously alive, limit exceeded)

5 participants