Skip to content

Conversation

@bording
Copy link
Collaborator

@bording bording commented Aug 26, 2016

This is the first PR mentioned in #251.

Currently this is based on my vs-support branch from #248, once that's merged, I'll rebase this against master instead.

f36ca33 is what is new in this PR.

@michaelklishin
Copy link
Contributor

Thank you!

ConsumerWorkService has to guarantee per-channel dispatch ordering and dispatch in such a way that consumers can use synchronous protocol methods (e.g. queue.declare). Do you think there can be any changes in those areas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those confused by this magic constant: channel 0 is used for "special purpose" protocol communication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed that the current design causes even the MainSession model to be registered with the ConsumerWorkService, even though no consumers can be registered on it. Seemed like a reasonable optimization to skip those registrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Please leave a comment explaining that, through.

@bording
Copy link
Collaborator Author

bording commented Aug 29, 2016

@michaelklishin I pushed up a revised commit with a comment about model 0, let me know if it's ok.

@bording
Copy link
Collaborator Author

bording commented Aug 29, 2016

@kjnilsson What race condition was 166cf17 trying to fix? I was testing this PR initially against 3.6.5, which didn't have this, and it looks like the ChannelNumber 0 optimization I made here is now throwing an NRE because of it.

Same problem should be occurring on #253.

@bording bording force-pushed the improve-BatchingWorkPool branch from 0694ce8 to c2535fb Compare August 30, 2016 22:13
@bording
Copy link
Collaborator Author

bording commented Aug 30, 2016

I've rebased this branch on top of the lastest vs-support branch changes, and I went ahead and removed the special model 0 handling that was throwing an exception because of the ordering change. Having that in this branch is much less critical.

@kjnilsson
Copy link
Contributor

The concourse build failed on this test:

Errors and Failures
1) Failed : RabbitMQ.Client.Unit.TestConsumerOperationDispatch.TestDeliveryOrderingWithSingleChannel
  Expected: 96
  But was:  100
at RabbitMQ.Client.Unit.TestConsumerOperationDispatch.TestDeliveryOrderingWithSingleChannel() in /tmp/build/b95f7b06/rabbitmq_dotnet_client/projects/client/Unit/src/unit/TestConsumerOperationDispatch.cs:line 142

@michaelklishin
Copy link
Contributor

@kjnilsson that's likely a timing issue.

@kjnilsson
Copy link
Contributor

@michaelklishin yes just wanted to note it as it isn't yet public. Running the tests again.

@bording bording force-pushed the improve-BatchingWorkPool branch from c2535fb to 41a5574 Compare August 31, 2016 15:55
@bording
Copy link
Collaborator Author

bording commented Aug 31, 2016

This branch is now rebased against master.

For the failing test, yes that is expected on this PR, because the changes to the BatchingWorkPool can't guarantee ordering. Looking at the test and thinking about it more, I can see how this likely isn't an acceptable trade-off, given how acking of multiple delivery tags works.

@kjnilsson
Copy link
Contributor

@bording can you explain where the ordering guarantees are given up with your approach?

@michaelklishin
Copy link
Contributor

If we really give up the ordering guarantee in this PR, it's a no go compared to #253.

@bording
Copy link
Collaborator Author

bording commented Aug 31, 2016

@kjnilsson Take a look at BatchingWorkPool.AddWorkItem and how it uses the ready queue. It will add an entry in there for every call, so you can end up with multiple NextWorkBlock calls looking at the same model's work.

The existing implementation uses the SetQueue class to prevent the same model from being queued multiple times, but the class isn't thread-safe, and I couldn't find any sort of concurrent version of a collection that had the same behavior.

@Scooletz and I tried to come up with another way to prevent the same model from being queued multiple times, but everything we came up with either required reintroducing locks (and defeating the point of this change) or could potentially run into the opposite problem of having work items and not scheduling a NextWorkBlock call to dispatch them.

When we realized this, that's when we went with a more through redesign of the ConsumerWorkService, as presented in #253.

@michaelklishin
Copy link
Contributor

michaelklishin commented Aug 31, 2016

@bording then it sounds like you and @Scooletz consider #253 to be fairly objectively superior to this PR?

@bording
Copy link
Collaborator Author

bording commented Aug 31, 2016

@michaelklishin If ordering is a hard requirement that can't be relaxed, then I think #253 is the only option, unless there's some way we haven't thought of to bring it back to this PR without reintroducing a lock.

I only realized today that it might not be possible to give up ordering because of the multiple delivery tag ack option. We don't use that in our code at all, and I had forgotten that it was even part of the spec. 😢

If ordering wasn't a requirement, than I actually prefer this approach, because it does maintain the ability to pass in a custom task scheduler for concurrency throttling, and since it's not using dedicated threads that block when they don't have work, you see a lot less "synchronization time" when profiling compared to #253.

Given that this approach does have some benefits, that was why we initially presented both options.

@michaelklishin
Copy link
Contributor

We cannot give up ordering for backwards compatibility, too. I think the numbers you posted from #253 are very comparable. Not sure how many users provide a custom task scheduler.

@kjnilsson
Copy link
Contributor

kjnilsson commented Sep 1, 2016

Ok I've had a play with the PR to see if I can get the TestDeliveryOrderingWithSingleChannel to pass somehow but even after adding back the SetQueue (or SeqQueue behaviour rather) and adding a bunch of relatively course grained locks the test still fails. I suspect something else may be going on that isn't immediately obvious.

I now agree we should proceed with #253 . It does add a breaking change to a public interface (hello v5!) and a fair bit of internal changes as well but the approach is simple and easy to reason about. I don't think that many use a custom TaskScheduler but with a one to one relationship between Models and threads it may not be necessary.
Also it looks like it may allocate less than a creating tasks recursively which would be nice. Once 253 is rebased I will review it properly and test it.

@michaelklishin
Copy link
Contributor

Speaking of versions, we should probably release 4.1.0 with it (the API change is not going to affect too many).

@michaelklishin
Copy link
Contributor

Closing this in favour of #253 based on a number of discussions in #251, #253 and here.

@bording bording deleted the improve-BatchingWorkPool branch October 24, 2016 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants