-
Notifications
You must be signed in to change notification settings - Fork 21.5k
Description
System information
My current version is:
Geth
Version: 1.8.17-stable
Git Commit: 8bbe72075e4e16442c4e28d999edee12e294329e
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10.1
Operating System: linux
GOPATH=
GOROOT=/usr/lib/go-1.10
Expected behaviour
Keep the normal signing .
Actual behaviour
I was running a go-ethereum private network with 6 sealers.
Each sealer is run by:
directory=/home/poa
command=/bin/bash -c 'geth --datadir sealer4/ --syncmode 'full' --port 30393 --rpc --rpcaddr 'localhost' --rpcport 8600 --rpcapi='net,web3,eth' --networkid 30 --gasprice '1' -unlock 'someaddress' --password sealer4/password.txt --mine '
The blockchain was running good for about 1-2 months.
Today i found that all the nodes were having issues. Each node was emmiting the message "Signed recently, must wait for others"
I check out the logs and i found this message every 1 hour, no more information, the nodes where not mining:
Regenerated local transaction journal transactions=0 accounts=0
Regenerated local transaction journal transactions=0 accounts=0
Regenerated local transaction journal transactions=0 accounts=0
Regenerated local transaction journal transactions=0 accounts=0
Experimenting the same issue with 6 sealers, i restarted each node but now im get stucked in
INFO [01-07|18:17:30.645] Etherbase automatically configured address=0x5Bc69DC4dba04b6955aC94BbdF129C3ce2d20D34
INFO [01-07|18:17:30.645] Commit new mining work number=488677 sealhash=a506ec…8cb403 uncles=0 txs=0 gas=0 fees=0 elapsed=133.76µs
INFO [01-07|18:17:30.645] Signed recently, must wait for others
The first thing that is weird is that, some nodes are stucked on the 488677 and others are on 488676, this behaviour was reported on this issue #16406 same for the user @lyhbarry
Example:
Signer 1
Signer 2
Note that there is no votes pending
So, right now, i shut down and restar each node, i have found that:
- Each node is paired with the others
- Each node is part of clique.getSigners()
- Each node is waiting for another to sign...
INFO [01-07|18:41:56.134] Signed recently, must wait for others
INFO [01-07|19:41:42.125] Regenerated local transaction journal transactions=0 accounts=0
INFO [01-07|18:41:56.134] Signed recently, must wait for others
So, the syncronization fail but also i just can start signing again because each node is stucked waiting for the others, that means, the network is useless?
The comment of @tudyzhb on that issue mention that:
Ref clique-seal of v1.8.11, I think there is no an effective mechanism to retry seal, when an in-turn/out-of-turn seal fail occur. So our dev network useless easily.
After this problem, i take a look at the logs, each signer has this error messages:
Synchronisation failed, dropping peer peer=7875a002affc775b err="retrieved hash chain is invalid"
INFO [01-02|16:42:10.902] Signed recently, must wait for others
WARN [01-02|16:42:11.960] Synchronisation failed, dropping peer peer=7875a002affc775b err="retrieved hash chain is invalid"
INFO [01-02|16:42:12.128] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=540.282µs mgasps=0.000 number=488116 hash=269920…afd3c7 cache=5.99kB
INFO [01-02|16:42:12.129] Commit new mining work number=488117 sealhash=f7b00c…787d5c uncles=2 txs=0 gas=0 fees=0 elapsed=307.314µs
INFO [01-02|16:42:20.929] Successfully sealed new block number=488117 sealhash=f7b00c…787d5c hash=f17438…93ffe3 elapsed=8.800s
INFO [01-02|16:42:20.929] 🔨 mined potential block number=488117 hash=f17438…93ffe3
INFO [01-02|16:42:20.930] Commit new mining work number=488118 sealhash=b09b33…1526ba uncles=2 txs=0 gas=0 fees=0 elapsed=520.754µs
INFO [01-02|16:42:20.930] Signed recently, must wait for others
INFO [01-02|16:42:31.679] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=2.253ms mgasps=0.000 number=488118 hash=763a32…a579f5 cache=5.99kB
INFO [01-02|16:42:31.680] 🔗 block reached canonical chain number=488111 hash=3d44dc…df0be5
INFO [01-02|16:42:31.680] Commit new mining work number=488119 sealhash=c8a5e7…db78a1 uncles=2 txs=0 gas=0 fees=0 elapsed=214.155µs
INFO [01-02|16:42:31.680] Signed recently, must wait for others
INFO [01-02|16:42:40.901] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=808.903µs mgasps=0.000 number=488119 hash=accc3f…44bc4c cache=5.99kB
INFO [01-02|16:42:40.901] Commit new mining work number=488120 sealhash=f73978…c03fa7 uncles=2 txs=0 gas=0 fees=0 elapsed=275.72µs
INFO [01-02|16:42:40.901] Signed recently, must wait for others
WARN [01-02|16:42:41.961] Synchronisation failed, dropping peer peer=7875a002affc775b err="retrieved hash chain is invalid"
I also see some:
INFO [01-02|16:58:10.902] 😱 block lost number=488205 hash=1fb1c5…a41a42
This error about hash chain was just a warning, so the node keep mining until the 2th of january, then i saw this on each of the 6 nodes
I was looking that there are a lot of issues about this error, the most similar is the one i posted here but is unresolved.
Most of the issues workarrounds seems to be a restart, but in this case, the chain seems to be is in a unconsistent state and the nodes are always waiting for the others
So,
- any ideas? peers are connected, accounts are unlocked, it just entered into a deadlock situation after 450k blocks
- any logs that i can provide? i only see the warnings of the error described and the block lost, but nothing when the node stoped to be mining
- Is this PR related? les: fix fetcher syncing logic #18072
- Maybe is related with the comment of @karalabe onthis issue Geth signing stops after a period of time #16406?
5 Upgrading from 1.8.17 to 1.8.20 will solve this? - In my opinion, seems like a race condition or something, since i have 2 chains, one running for 2 months, the other one for three months and is the first time this error happens
This are other related issues:
#16444 (Same issue but i dont have votes pending in my snapshot)


