-
Notifications
You must be signed in to change notification settings - Fork 695
Moving consensus engine into VM #1875
Conversation
|
|
||
| @cached_property | ||
| def _consensus(self) -> ConsensusAPI: | ||
| return self.consensus_class(self.chaindb.db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean each VM would get it's own instance of consensus_class? That would be tragic as the ConsensusAPI can be stateful (Clique keeps an in-memory cache). We currently do not have a test case for this but it would be good to add one. It should be easy to modify this to keep consensus state in memory and mine blocks that pass over VM boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I think it is important to figure out how to make it stateless. Everything else that might change in a fork (in the VM) can be built up and torn down on demand, which we lean on heavily to ensure smooth fork transitions. Managing a change in Clique at a block boundary will be a nightmare if it requires some persistent in-memory store.
There must be some way for Clique to load the data from disk. How long does that take in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmh..ok let's see. So, there are multiple places of caching in the current clique implementation but let's ignore the one's that are for performance and look at the HeaderCache which I think is the problematic one.
To recall, in Clique you can not validate the seal of a header unless you have access to the previous header. Let's assume we validate a series of headers via validate_chain(...). These headers may span over multiple VMs and result in a validate_seal(...) call on each of these VMs. By the time that we validate the seal of the second header (or third, fourth, ns) the previous header won't be in the database yet (after all we are just in the middle of validating a whole series so none of it will get to the db until we've validated the entire series).
So in order to be able to look it up on subsequent validate_seal calls, we first put it into our HeaderCache when validate_seal is called.
py-evm/eth/consensus/clique/clique.py
Lines 132 to 141 in 843ab29
| def validate_seal(self, header: BlockHeaderAPI) -> None: | |
| """ | |
| Validate the seal of the given ``header`` according to the Clique consensus rules. | |
| """ | |
| if header.block_number == 0: | |
| return | |
| validate_header_integrity(header, self._epoch_length) | |
| self._header_cache[header.hash] = header |
We later flush that as soon as we know that the headers are in the db (we actually keep the last 1000 in memory to account for temporary forks).
py-evm/eth/consensus/clique/snapshot_manager.py
Lines 72 to 80 in 843ab29
| def _lookup_header(self, block_hash: Hash32) -> BlockHeader: | |
| if block_hash in self._header_cache: | |
| return self._header_cache[block_hash] | |
| try: | |
| return self._chain_db.get_block_header_by_hash(block_hash) | |
| except HeaderNotFound: | |
| raise ValidationError("Unknown ancestor %s", block_hash) |
The other way to fix that that I see is to overwrite validate_chain with a clique-specific implementation because that method has access to the entire series but then this would make the consensus engine operate on the chain level again.
|
|
||
| chain = MiningChain.configure( | ||
| vm_configuration=vms, | ||
| vm_configuration=clique_vms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I guess we would then also have a NoConsensusApplier and PowConsensusApplier and the canonical way to activate a chain for a specific type of consensus would be:
Chain.configure(vm_configuraton=SomeApplier.amend_vm_configuration(vms))But as I noted in a comment above, this setup seems to be setup to have each VM maintain their own ConsensusAPI instance which only seems to work if the ConsensusAPI were stateless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I guess we would then also have a
NoConsensusApplierandPowConsensusApplierand the canonical way to activate a chain for a specific type of consensus would be:
We don't seem to need it, because it's so trivial to just set the consensus class for the others, but I'm not really opposed. I think we can have a SimpleConsensusApplier that just changes the consensus_class and is sufficient for both PoW and NoProof.
which only seems to work if the
ConsensusAPIwere stateless.
Yeah, I'm not ready to give up on stateless VMs yet.
eth/tools/builder/chain/builders.py
Outdated
| for block_number, vm in chain_class.vm_configuration | ||
| ) | ||
| return chain_class.configure( | ||
| vm_configuration=no_pow_vms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems like a nice place to use that DefaultConsensusApplier concept.
|
I split it out into two commits, the first should be an unchanged version of #1874. So now it should be easier to see what the changes I'm proposing are. |
7655acd to
2ce1176
Compare
|
Okay, so now the move to the VM is isolated to 2ce1176 . The only piece I got stuck on is that somehow the state root changed. The balances of the relevant accounts are the same, I also checked that the balance at the 0 address was empty. So it's not clear what happened there. Two main questions left in my mind:
|
It looks like it's "touching" the account at 0, so it previously didn't exist, and then afterwards it exists as an empty account. I'm not sure why this PR would have that effect. Edit: Yup, I was just able to confirm that if you run the test with only the first commit, then account 0 does not get touched. |
|
I generally like the direction. However, I'm pretty sure this will crash when we validate a series of headers across VM boundaries.
You might have missed my in-line comment so let me put it here again. So, there are multiple places of caching in the current clique implementation but let's ignore the one's that are for performance and look at the To recall, in Clique you can not validate the seal of a header unless you have access to the previous header. Let's assume we validate a series of headers via So in order to be able to look it up on subsequent py-evm/eth/consensus/clique/clique.py Lines 132 to 141 in 843ab29
We later flush that as soon as we know that the headers are in the db. py-evm/eth/consensus/clique/snapshot_manager.py Lines 72 to 80 in 843ab29
The other way to fix that that I see is to overwrite |
|
If we have the It does also mean that you could not directly call |
Good point! In fact, it looks like neither the current in-VM solution nor the current in-Chain solution will work with So here's what I'm thinking:
|
Well, yes, that's why on the Trinity side we differentiate between
Your last commit defines def validate_extension(
self,
new_header: BlockHeaderAPI,
check_seal: bool = True) -> None:So I guess there's a conflict between what you explained and the code in that commit. I'm going to assume
I'm also not entirely sure why it needs to be in the database at this point. Isn't the only important thing that the VM has access to it? Shouldn't in-memory access be enough? And am I right to assume that on the Trinity side, instead of messing with I'm a bit skeptical if we really need all the Here's another proposal: We could change def validate_header(cls,
header: BlockHeaderAPI,
parent_header: BlockHeaderAPI,
check_seal: bool = True
) -> None:To this 👇 : def validate_header(cls,
header: BlockHeaderAPI,
parent_headers: Iterable[BlockHeaderAPI],
check_seal: bool = True
) -> None:And then in Ok, I have to admit, I have a hard time to think it all through without actually playing with the code and it took me a ridiculous long time to even write this response. Tomorrow I'm going to play with the code to see where I end up 🙃 |
|
Ok, I think I'm starting to understand what you meant with:
But I want to make sure to also understand the consequences of that. Here's what I did now:
I'm not proposing this as the API but it felt like the most ad hoc thing I could do to let me play with the general idea of getting rid of the
It doesn't and I guess that is where your comment about "the oldest parent must be persisted to the database" comes into play. If I interpret that correctly it means that the second batch of headers arrives and we are calling This wasn't a problem in the previous non-stateless model because the So the thing that I'm wondering about is how this would now affect the syncing code. Especially because waiting for the blocks to be imported is the slow part of the sync. Or would we persist the headers immediately after they were validated? |
So what I decided while I was fooling around with the code is that most of the time the validation can happen one header at the time, so the multi-parent thing is unnecessary. We can just validate it at ... But we might want to formalize the header-only persist use-case (which we are hacking around for two different scenarios already: Beam Sync and Light Chains). That often persists multiple headers at once. So we would probably want to validate an extension of multiple headers at once. Even then, we wouldn't need to supply multiple parents at the Chain API level (because we are still presuming that the parents of the headers to import are present), but we would need multiple parents at the VM level. Because, presumably, we would validate the whole series of headers before persisting any of them, in order to keep the benefit of batching the writes to the database. So we would have something like: class Chain:
def import_headers(self, headers):
# this only imports headers, trying to access block bodies will still crash after this completes
for index, header in enumerate(headers):
vm = self.get_vm(header)
# pass in any parents that are not already in the database
parents = headers[:index]
vm.validate_extension(parents, header)
self.chaindb.persist_header_chain(headers) |
I think we'll need a test that That's why I think it's preferable to add a new validate method that can do stricter validations that only work when all its parents are available, because there are certain checks we simply can't reliably run in the above scenario (like the Clique checks). |
|
Handled in #1899 |
Exploring a slight variant on #1874
So far, I haven't found any dealbreakers. This is obviously not finished, but I have to wrap for the night, and wanted to share the concept.
I think there is eventually a path to encapsulating everything into the consensus engine, but I don't think it's strictly necessary in this path. (where we can still do some patching)
cc @cburgdorf for 👀