-
Notifications
You must be signed in to change notification settings - Fork 21.5k
Description
Please note that I am using a geth fork with a modified Clique consensus, however the code around sync, downloader, etc. is unchanged. The base geth version v1.10.26 I am using is quite old. If required, I will try to rebase to the latest version if this might solve the issue. I will be very grateful for any help or hints on how to solve this issue.
System information
Geth version: Geth/v1.10.26-stable-f05c9b7d/darwin-amd64/go1.20.3
CL client & version: none (pre-merge private PoA network)
OS & Version: Linux (Ubuntu 20.04.4 LTS) & macOs Monterey 12.6.4 (21G526)
Background
I am operating three private PoA networks running on a geth fork with a slightly modified Clique consensus.
- Local testnet on my Mac (3 nodes)
- Testnet (~200 nodes)
- Mainnet (~3000 nodes)
Recently I have noticed that new nodes on both Testnet (2) and Mainnet (3) started failing to sync (both full and snap sync) when syncing from block zero.
The problem does not occur when trying to sync the blockchain data from the local testnet (1).
Syncing up to the head starting from a recent block works OK on all networks.
I exported the blockchain data from both Testnet (2) and Mainnet (3), imported them into a new local testnet (2 nodes) on my Mac and was able to reproduce the problem.
Expected behaviour
Fresh sync (both full and snap) completes successfully.
Actual behaviour
Geth fails at the beginning of a fresh sync (both full and snap) with the following errors.
Testnet blockchain data:
(...)
DEBUG[07-07|12:44:54.269] Fetching batch of headers id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=193 skip=0 reverse=false
TRACE[07-07|12:44:54.270] Skeleton filling not accepted peer=ef23378316092d8f from=193
DEBUG[07-07|12:44:54.270] Failed to deliver retrieved headers peer=ef233783 err="delivery not accepted"
(...)
DEBUG[07-07|12:44:54.373] Fetching batch of headers id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=6145 skip=0 reverse=false
TRACE[07-07|12:44:54.373] Skeleton filling not accepted peer=ef23378316092d8f from=6145
DEBUG[07-07|12:44:54.373] Failed to deliver retrieved headers peer=ef233783 err="delivery not accepted"
(...)
DEBUG[07-07|12:44:54.560] Skeleton fill failed err="no peers available or all tried for download"
DEBUG[07-07|12:44:54.560] Skeleton chain invalid peer=ef233783 err="no peers available or all tried for download"
DEBUG[07-07|12:44:54.560] Header download terminated peer=ef233783
DEBUG[07-07|12:44:54.561] Block body download terminated err="syncing canceled (requested)"
DEBUG[07-07|12:44:54.561] Receipt download terminated err="syncing canceled (requested)"
DEBUG[07-07|12:44:54.636] Synchronisation terminated elapsed=381.552ms
WARN [07-07|12:44:54.636] Synchronisation failed, dropping peer peer=ef23378316092d8f85540b7b17a32d5fa791018ba748ea9e663bab26b2496822 err="retrieved hash chain is invalid: no peers available or all tried for download"
(...)
Mainnet blockchain data:
(...)
DEBUG[07-07|12:54:16.867] Fetching batch of headers id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=193 skip=0 reverse=false
TRACE[07-07|12:54:16.867] Skeleton filling not accepted peer=ef23378316092d8f from=193
DEBUG[07-07|12:54:16.867] Failed to deliver retrieved headers peer=ef233783 err="delivery not accepted"
(...)
DEBUG[07-07|12:54:17.123] Skeleton fill failed err="no peers available or all tried for download"
DEBUG[07-07|12:54:17.123] Skeleton chain invalid peer=ef233783 err="no peers available or all tried for download"
DEBUG[07-07|12:54:17.123] Header download terminated peer=ef233783
DEBUG[07-07|12:54:17.123] Block body download terminated err="syncing canceled (requested)"
DEBUG[07-07|12:54:17.123] Receipt download terminated err="syncing canceled (requested)"
DEBUG[07-07|12:54:17.175] Synchronisation terminated elapsed=321.873ms
WARN [07-07|12:54:17.175] Synchronisation failed, dropping peer peer=ef23378316092d8f85540b7b17a32d5fa791018ba748ea9e663bab26b2496822 err="retrieved hash chain is invalid: no peers available or all tried for download"
(...)
Local testnet blockchain data syncs successfully:
(...)
DEBUG[07-07|14:24:18.635] Fetching batch of headers id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=193 skip=0 reverse=false
TRACE[07-07|14:24:18.637] Pre-scheduled new headers peer=ef23378316092d8f count=192 from=193
TRACE[07-07|14:24:18.637] Delivered new batch of headers peer=ef233783 count=192 accepted=192
(...)
DEBUG[07-07|14:24:18.685] Fetching batch of headers id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=6145 skip=0 reverse=false
TRACE[07-07|14:24:18.687] Pre-scheduled new headers peer=ef23378316092d8f count=192 from=6145
TRACE[07-07|14:24:18.687] Delivered new batch of headers peer=ef233783 count=192 accepted=192
(...)
DEBUG[07-07|14:28:22.399] Inserting downloaded chain items=2 firstnum=470,749 firsthash=d9cd1c..128ee4 lastnum=470,750 lasthash=639be6..56b593
DEBUG[07-07|14:28:22.399] No more headers available peer=ef233783
DEBUG[07-07|14:28:22.400] Header download terminated peer=ef233783
DEBUG[07-07|14:28:22.400] Block body download terminated err=nil
DEBUG[07-07|14:28:22.400] Receipt download terminated err=nil
INFO [07-07|14:28:22.401] Imported new chain segment blocks=2 txs=0 mgas=0.000 elapsed=1.587ms mgasps=0.000 number=470,750 hash=639be6..56b593 age=1w3d1h dirty=218.47KiB
DEBUG[07-07|14:28:22.401] Synchronisation terminated elapsed=4m3.786s
TRACE[07-07|14:28:22.403] Announced block hash=639be6..56b593 recipients=1 duration=2562047h47m16.854s
TRACE[07-07|14:28:22.403] Announced block id=ef23378316092d8f conn=trusted-staticdial number=470,750 hash=639be6..56b593
(...)
Steps to reproduce the behaviour
Below are the steps I follow to reproduce the problem with a two node network on my Mac.
-
Init the first node (the node that has the full blockchain data)
geth --datadir ./node1 init genesis.json -
Init the second node (the node that tries to full sync from block zero)
geth --datadir ./node2 init genesis.json -
Import the blockchain data into the first node from a file
geth --datadir ./node1 import tnet-blockchain-99498.backup
The import completes successfully so I assume this is not a problem of corrupted blockchain data (bad blocks, etc.) -
Run the first node
geth --datadir ./node1 --networkid 116111110106 --syncmode "full" --port 30314 --nat "extip:127.0.0.1" --authrpc.port "8602" --nodiscover --verbosity 3 --vmodule=eth=5,p2p=5,downloader=5 -
Run the second node
geth --datadir ./node2 --networkid 116111110106 --syncmode "full" --port 30316 --nat "extip:127.0.0.1" --authrpc.port "8603" --nodiscover --verbosity 3 --vmodule=eth=5,p2p=5,downloader=5
(I am using static-nodes.json/trusted-nodes.json for node discovery)
If in step 3 I import the blockchain data from the original local testnet that was running on my Mac, the full sync completes successfully. If instead I import the blockchain data from either the Testnet or Mainnet, the full sync fails at block 193 (and at block 6145 for the Testnet data).
I found out that I can make the node sync successfully by bootstraping the sync process with some initial blockchain data. If I import the first 6228 Testnet blocks from a file, or the first 280 blocks for Mainnet, into node2 the full sync will complete successfully.
Backtrace
-
Init the first node (the node that has the full blockchain data)
step1_init_node1_log.txt -
Init the second node (the node that tries to full sync from block zero)
step2_init_node2_log.txt -
Import the blockchain data into the first node from a file
step3_import_node1_log.txt -
Run the first node
step4_run_node1_log.txt -
Run the second node: full sync fails
step5_sync_node2_log.txt