Skip to content

Private PoA network sync fails when trying to sync from block zero. #27667

@jakub-freebit

Description

@jakub-freebit

Please note that I am using a geth fork with a modified Clique consensus, however the code around sync, downloader, etc. is unchanged. The base geth version v1.10.26 I am using is quite old. If required, I will try to rebase to the latest version if this might solve the issue. I will be very grateful for any help or hints on how to solve this issue.

System information

Geth version: Geth/v1.10.26-stable-f05c9b7d/darwin-amd64/go1.20.3
CL client & version: none (pre-merge private PoA network)
OS & Version: Linux (Ubuntu 20.04.4 LTS) & macOs Monterey 12.6.4 (21G526)

Background

I am operating three private PoA networks running on a geth fork with a slightly modified Clique consensus.

  1. Local testnet on my Mac (3 nodes)
  2. Testnet (~200 nodes)
  3. Mainnet (~3000 nodes)

Recently I have noticed that new nodes on both Testnet (2) and Mainnet (3) started failing to sync (both full and snap sync) when syncing from block zero.
The problem does not occur when trying to sync the blockchain data from the local testnet (1).
Syncing up to the head starting from a recent block works OK on all networks.
I exported the blockchain data from both Testnet (2) and Mainnet (3), imported them into a new local testnet (2 nodes) on my Mac and was able to reproduce the problem.

Expected behaviour

Fresh sync (both full and snap) completes successfully.

Actual behaviour

Geth fails at the beginning of a fresh sync (both full and snap) with the following errors.

Testnet blockchain data:

(...)
DEBUG[07-07|12:44:54.269] Fetching batch of headers                id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=193   skip=0   reverse=false
TRACE[07-07|12:44:54.270] Skeleton filling not accepted            peer=ef23378316092d8f from=193
DEBUG[07-07|12:44:54.270] Failed to deliver retrieved headers      peer=ef233783         err="delivery not accepted"
(...)
DEBUG[07-07|12:44:54.373] Fetching batch of headers                id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=6145  skip=0   reverse=false
TRACE[07-07|12:44:54.373] Skeleton filling not accepted            peer=ef23378316092d8f from=6145
DEBUG[07-07|12:44:54.373] Failed to deliver retrieved headers      peer=ef233783         err="delivery not accepted"
(...)
DEBUG[07-07|12:44:54.560] Skeleton fill failed                     err="no peers available or all tried for download"
DEBUG[07-07|12:44:54.560] Skeleton chain invalid                   peer=ef233783         err="no peers available or all tried for download"
DEBUG[07-07|12:44:54.560] Header download terminated               peer=ef233783
DEBUG[07-07|12:44:54.561] Block body download terminated           err="syncing canceled (requested)"
DEBUG[07-07|12:44:54.561] Receipt download terminated              err="syncing canceled (requested)"
DEBUG[07-07|12:44:54.636] Synchronisation terminated               elapsed=381.552ms
WARN [07-07|12:44:54.636] Synchronisation failed, dropping peer    peer=ef23378316092d8f85540b7b17a32d5fa791018ba748ea9e663bab26b2496822 err="retrieved hash chain is invalid: no peers available or all tried for download"
(...)

Mainnet blockchain data:

(...)
DEBUG[07-07|12:54:16.867] Fetching batch of headers                id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=193     skip=0   reverse=false
TRACE[07-07|12:54:16.867] Skeleton filling not accepted            peer=ef23378316092d8f from=193
DEBUG[07-07|12:54:16.867] Failed to deliver retrieved headers      peer=ef233783         err="delivery not accepted"
(...)
DEBUG[07-07|12:54:17.123] Skeleton fill failed                     err="no peers available or all tried for download"
DEBUG[07-07|12:54:17.123] Skeleton chain invalid                   peer=ef233783         err="no peers available or all tried for download"
DEBUG[07-07|12:54:17.123] Header download terminated               peer=ef233783
DEBUG[07-07|12:54:17.123] Block body download terminated           err="syncing canceled (requested)"
DEBUG[07-07|12:54:17.123] Receipt download terminated              err="syncing canceled (requested)"
DEBUG[07-07|12:54:17.175] Synchronisation terminated               elapsed=321.873ms
WARN [07-07|12:54:17.175] Synchronisation failed, dropping peer    peer=ef23378316092d8f85540b7b17a32d5fa791018ba748ea9e663bab26b2496822 err="retrieved hash chain is invalid: no peers available or all tried for download"
(...)

Local testnet blockchain data syncs successfully:

(...)
DEBUG[07-07|14:24:18.635] Fetching batch of headers                id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=193     skip=0   reverse=false
TRACE[07-07|14:24:18.637] Pre-scheduled new headers                peer=ef23378316092d8f count=192 from=193
TRACE[07-07|14:24:18.637] Delivered new batch of headers           peer=ef233783         count=192 accepted=192
(...)
DEBUG[07-07|14:24:18.685] Fetching batch of headers                id=ef23378316092d8f conn=trusted-staticdial count=192 fromnum=6145    skip=0   reverse=false
TRACE[07-07|14:24:18.687] Pre-scheduled new headers                peer=ef23378316092d8f count=192 from=6145
TRACE[07-07|14:24:18.687] Delivered new batch of headers           peer=ef233783         count=192 accepted=192
(...)
DEBUG[07-07|14:28:22.399] Inserting downloaded chain               items=2    firstnum=470,749 firsthash=d9cd1c..128ee4 lastnum=470,750 lasthash=639be6..56b593
DEBUG[07-07|14:28:22.399] No more headers available                peer=ef233783
DEBUG[07-07|14:28:22.400] Header download terminated               peer=ef233783
DEBUG[07-07|14:28:22.400] Block body download terminated           err=nil
DEBUG[07-07|14:28:22.400] Receipt download terminated              err=nil
INFO [07-07|14:28:22.401] Imported new chain segment               blocks=2    txs=0       mgas=0.000    elapsed=1.587ms     mgasps=0.000   number=470,750 hash=639be6..56b593 age=1w3d1h     dirty=218.47KiB
DEBUG[07-07|14:28:22.401] Synchronisation terminated               elapsed=4m3.786s
TRACE[07-07|14:28:22.403] Announced block                          hash=639be6..56b593 recipients=1 duration=2562047h47m16.854s
TRACE[07-07|14:28:22.403] Announced block                          id=ef23378316092d8f conn=trusted-staticdial number=470,750 hash=639be6..56b593
(...)

Steps to reproduce the behaviour

Below are the steps I follow to reproduce the problem with a two node network on my Mac.

  1. Init the first node (the node that has the full blockchain data)
    geth --datadir ./node1 init genesis.json

  2. Init the second node (the node that tries to full sync from block zero)
    geth --datadir ./node2 init genesis.json

  3. Import the blockchain data into the first node from a file
    geth --datadir ./node1 import tnet-blockchain-99498.backup
    The import completes successfully so I assume this is not a problem of corrupted blockchain data (bad blocks, etc.)

  4. Run the first node
    geth --datadir ./node1 --networkid 116111110106 --syncmode "full" --port 30314 --nat "extip:127.0.0.1" --authrpc.port "8602" --nodiscover --verbosity 3 --vmodule=eth=5,p2p=5,downloader=5

  5. Run the second node
    geth --datadir ./node2 --networkid 116111110106 --syncmode "full" --port 30316 --nat "extip:127.0.0.1" --authrpc.port "8603" --nodiscover --verbosity 3 --vmodule=eth=5,p2p=5,downloader=5
    (I am using static-nodes.json/trusted-nodes.json for node discovery)

If in step 3 I import the blockchain data from the original local testnet that was running on my Mac, the full sync completes successfully. If instead I import the blockchain data from either the Testnet or Mainnet, the full sync fails at block 193 (and at block 6145 for the Testnet data).

I found out that I can make the node sync successfully by bootstraping the sync process with some initial blockchain data. If I import the first 6228 Testnet blocks from a file, or the first 280 blocks for Mainnet, into node2 the full sync will complete successfully.

Backtrace

  1. Init the first node (the node that has the full blockchain data)
    step1_init_node1_log.txt

  2. Init the second node (the node that tries to full sync from block zero)
    step2_init_node2_log.txt

  3. Import the blockchain data into the first node from a file
    step3_import_node1_log.txt

  4. Run the first node
    step4_run_node1_log.txt

  5. Run the second node: full sync fails
    step5_sync_node2_log.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions