Skip to content

Conversation

@zzy96
Copy link
Contributor

@zzy96 zzy96 commented Dec 1, 2019

Behavior

When running geth with external signer (Clef), if Clef is restarted, the first call to Clef always fail but the subsequent calls work fine.

Analysis & Fix

In rpc/client.go, when any error like EOF is sent to readErr channel, conn will be closed but c.writeConn is not set to nil. If a new request comes after conn is closed, it must fail once before it will try to reconnect. This fix is to set c.writeConn = nil when readErr is received and conn is closed.

@zzy96 zzy96 requested review from fjl and holiman as code owners December 1, 2019 01:36
@fjl fjl changed the title Set writeConn to nil when conn is closed on readErr rpc: reset writeConn when conn is closed on readErr Dec 1, 2019
@fjl
Copy link
Contributor

fjl commented Dec 1, 2019

I'm not sure setting writeConn is safe in this context. This might introduce a race. Will check tomorrow.

@fjl
Copy link
Contributor

fjl commented Dec 1, 2019

Code docs say:

// writeConn is used for writing to the connection on the caller's goroutine. It should
// only be accessed outside of dispatch, with the write lock held. The write lock is
// taken by sending on requestOp and released by sending on sendDone.

This comment is slightly out of date because requestOp has since been renamed to reqInit.
Still means we cannot set this field from inside the dispatch loop.

writeConn is closed when a read error happens. I think the best we can do to avoid the extra error on the next call is handling closed connections better on the write code path.

@zzy96
Copy link
Contributor Author

zzy96 commented Dec 2, 2019

Hi @fjl

I have tried another fix adding a retry flag for c.write.

Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this took a long time to get reviewed because I was busy with other things. I have tested the change and it solves the issue.

@fjl fjl merged commit 44c365c into ethereum:master Jan 27, 2020
@fjl
Copy link
Contributor

fjl commented Jan 27, 2020

Damn, I didn't change the commit message title. The title is a bit misleading because the fix is now based on the write error.

@zzy96 zzy96 deleted the fix-rpc-client-call-fail-on-restart branch January 28, 2020 12:52
@holiman holiman added this to the 1.9.11 milestone Feb 3, 2020
enriquefynn pushed a commit to enriquefynn/go-ethereum that referenced this pull request Mar 10, 2021
This change makes the client attempt to reconnect when a write fails.
We already had reconnect support, but the reconnect would previously
happen on the next call after an error. Being more eager leads to a
smoother experience overall.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants