-
Notifications
You must be signed in to change notification settings - Fork 110
swarm/network: remove dead code #1339
swarm/network: remove dead code #1339
Conversation
| case *ChunkDeliveryMsgSyncing: | ||
| msg = (*ChunkDeliveryMsg)(r) | ||
| mode = chunk.ModePutSync | ||
| case *ChunkDeliveryMsg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the tests is actually sending a ChunkDeliveryMsg and not the other two specific types, therefore failing.
This is not code that is hit in production, but I decided to add this here temporarily, rather than dig into the test.
It is not clear why this test was not failing with the previous code.
| ) | ||
| streamer.delivery.RequestFromPeers(ctx, req) | ||
|
|
||
| stream := NewStream(swarmChunkServerStreamName, "", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SwarmChunkServer doesn't exist anymore, and RETRIEVE_REQUEST doesn't exist anymore.
Also this is a huge test, which only checks that if we issue a request to RequestFromPeers, we get specific messages back - SubscribeMsg for the RETRIEVE_REQUEST stream, and a RetrieveRequestMsg - this is rather useless for two reasons:
- We are locking behaviour that doesn't provide much value.
- We are testing a very basic functionality, which is hit 1000 times per a single integration test.
No need to lock this behaviour in such a complicated way. Not to mention that RETRIEVE_REQUEST stream serves no purpose in any production code for a long time.
| Code: 6, | ||
| Msg: &ChunkDeliveryMsg{ | ||
| Addr: hash, | ||
| SData: hash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did that work before is beyond me... We don't have any validation for ChunkDeliveryMsg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
further up in localstore i guess there is. not on msg level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, there is validation in localstore, i just found this confusing here. now that you mention it, it makes sense.
janos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of work in this PR, thanks Anton.
swarm/network/stream/stream.go
Outdated
| return pstreams | ||
| } | ||
|
|
||
| func (api *API) GetPeerClientSubscriptions() map[string][]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to write a test for this function as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is trivial, just like the GetPeerServerSubscriptions. I don't think we need tests for it, chances that someone will refactor it and break it are so slim, that I'd rather we don't add 50 lines of tests like the other one.
Also this is not an API call that is part of critical functionality, it is here for debug purposes.
If we add unit test for this, we will probably have to remove it, once we review the Stream implementation - something that has been in discussions over the last weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is not 50 lines, but much more, we also have TestGetServerSubscriptionsRPC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestGetServerSubscriptionsRPC depends on:
simulationspackageadaptersnetStoredelivery- spec for
SubscribeMsg - json snapshots
- exact naming of the function, not covered by the compiler by dynamically linked
I'd rather we don't write such low-value tests for such trivial functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, tools that can be used for testing the code should not be relevant for decision if the code should be tested. If this code exists and somebody refactors the stream package, I suppose that it would be nice to have a test to verify that there is no regression. I would not block this PR for not testing this code if it is not an issue for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janos I strongly disagree with this.
If you are using 10 libraries to test a for loop that is adding values to a slice, you are increasing the complexity of the codebase for no good reason.
Now that these tests are part of the codebase, every time we change any of those 10 libraries, we also have to modify the tests. This is a cost to development that we should not ignore. The benefit of this test is that we are sure that we have not broken the for loop that is adding values to a slice - rather low-value in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanations. I will not insist on this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janos giving a bit more thought to this particular piece of code - the sole purpose of this API call exists is to be a helper method to debug the already broken subscriptions implementation.
I'd rather we focus on solving the bugs in the subscriptions implementation (without commenting what this involves - rewrite or bug fix or whatever), than write tests for the helper methods we add to the codebase to help us track the bigger issues in Swarm.
This feels like adding a test for a metrics counter or timer to me.
So if we insist to have a test on this API (which I see you don't in the comment above), I'd rather just remove it altogether, and find another way to debug the subscriptions functionality.
ba4cb04 to
ebab145
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments Anton.
|
@janos thanks for thorough review and for putting up with my rushed PR! |
2bb87ae to
4bb2f0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure light node functionality does not regress here?
changes here are maybe too big, too hasty ?
| // TODO: may need to implement protocol drop only? don't want to kick off the peer | ||
| // if they are useful for other protocols | ||
| func (p *Peer) Drop(err error) { | ||
| func (p *Peer) Drop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please preserve the error and log it or even put it in p2p.DiscSubprotocolError?
this is losing too much contextual info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree this could be converted to:
func (p *Peer) Drop(err error) {
if err == nil{ err = p2p.DiscSubprotocolError }
p.Disconnect(err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could log the error, but I'd rather not propagate it down to p2p, and replace DiscSubprotocolError just now, as this is changing behaviour, and we already have too many changes.
The current change preserves the behavior we already have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justelad I am not changing this behavior as well in this PR, let's keep it as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is removing the retrieve request stream justified because in current production code skipcheck is true. There was a reason why skipcheck was a (tested) option for retrieve request stream: we wanted to check if in some scenarios, chunk delivery for retrieve requests should go via the offered/wanted hash roundtrip. While this may introduce latency, without it multiple successful requests will lead to multiple deliveries.
At this point I am leaning towards handling the latter issue with request cancellation #409 which also has the chance to cancel requests for all forwarding nodes
In the light of these changes, the question arises: why are retrieve requests even part of the stream package and stream protocol. it would make sense to remove request handling entirely from stream, put it under network and make it part of bzz protocol directly.
|
@zelig good question. I think code that is not used by the users, but only hit in unit tests, has no place in the codebase as a general rule of thumb. Currently we have:
None of this is necessary for core Swarm functionality and is increasing the To summarize, the bugs we've found during this effort, that were 'hidden' due to this non-used functionality:
Because of all this I think it is justified to remove this dead code. Regarding the light client - it is possible that this is breaking a very small part of it, but fixing it should be rather simple. |
|
RetrieveRequests in the simplified version of the code, have nothing to do with Streams. The only thing they share is the Fetchers, so that we have only 1 request per chunk, no matter if this request was done through a RetrieveRequest or through Syncing. You have a point that this doesn't even belong in this package, but I have a hard time getting even simpler changes in, so I'd rather we change this gradually. |
|
Moving retrieve requests from the stream package is a very good idea. It would just need a different protocol as it is now part of the stream protocol specification. |
|
Yes, @janos, that's correct. Let's postpone this until we have all the known bug fixes integrated in the codebase, together with a solution for the subscription bug. Refactoring RetrieveRequests out of the Stream protocol is rather low priority right now, in my opinion. |
|
I agree, @nonsense. |
acud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nonsense great work. i left a few minor comments
| // TODO: may need to implement protocol drop only? don't want to kick off the peer | ||
| // if they are useful for other protocols | ||
| func (p *Peer) Drop(err error) { | ||
| func (p *Peer) Drop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree this could be converted to:
func (p *Peer) Drop(err error) {
if err == nil{ err = p2p.DiscSubprotocolError }
p.Disconnect(err)
}
|
|
||
| // This test checks the default behavior of the server, that is | ||
| // when it is serving Retrieve requests. | ||
| func TestLigthnodeRetrieveRequestWithRetrieve(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the registryOptions is supposed to indicate the this is a light node behavior but I see what you mean, it is not so clear what is being tested here
| return p.handleOfferedHashesMsg(ctx, msg) | ||
| go func() { | ||
| err := p.handleOfferedHashesMsg(ctx, msg) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am good with leaving these functions as is right now for clarity.
if we keep the error (and actually use it) in peer.Drop we could drop the log because the error will be automatically logged on the peer drop
|
There are many discussions about Adding Additionally We should first make sure that Swarm works well in a controlled environment, and then add tests that cover |
|
OK I see where you're going with this and I tend to agree. I never understood why we need to drop a peer (the p2p package negotiation should already cover for protocol version discrepancies) without actually maintaining a peer blacklist (whether if a decaying one to mitigate DoS vectors or a permanent one for litigation/other cryptoeconomic violations) |
|
@justelad yes, dropping peers, without maintaining a blacklist also doesn't make sense to me. Maybe your node is misbehaving because it has a temporary network problem, that could be one reason where you might want to ignore a node for a while, but try them again later. Bottom line - this functionality needs more work (backoff, blacklists, etc.), and in its current form it is more harmful than useful. |
|
@nonsense i must say i agree with everything you responded. |
|
PR has two approvals, so merging for now. This is not going to However I think it is more important to integrate the rest of the bugfixes, so that we have a more stable full swarm node, and only then worry about the rest. |
|
@zelig about the pointlessness of the features - the latencies that we have on the I very much welcome any improvements on those latencies - whether through a benchmark test or through an experiment between two deployments, or some other way, but definitely not just based on intuition, unless it is very easy argument to win everyone on the team on. |
This PR is:
SwarmChunkServerskipCheck- this will be a longer effortRETRIEVE_REQUESTstream, which is not used in any production code for anything.mastercode has been working for months.