Improve batching work pool performance #252

bording · 2016-08-26T19:41:22Z

This is the first PR mentioned in #251.

Currently this is based on my vs-support branch from #248, once that's merged, I'll rebase this against master instead.

f36ca33 is what is new in this PR.

michaelklishin · 2016-08-26T19:45:32Z

Thank you!

ConsumerWorkService has to guarantee per-channel dispatch ordering and dispatch in such a way that consumers can use synchronous protocol methods (e.g. queue.declare). Do you think there can be any changes in those areas?

michaelklishin · 2016-08-26T19:47:30Z

projects/client/RabbitMQ.Client/src/client/impl/ConsumerWorkService.cs

For those confused by this magic constant: channel 0 is used for "special purpose" protocol communication.

Yes, I noticed that the current design causes even the MainSession model to be registered with the ConsumerWorkService, even though no consumers can be registered on it. Seemed like a reasonable optimization to skip those registrations.

Makes sense. Please leave a comment explaining that, through.

bording · 2016-08-29T22:40:53Z

@michaelklishin I pushed up a revised commit with a comment about model 0, let me know if it's ok.

bording · 2016-08-29T23:24:38Z

@kjnilsson What race condition was 166cf17 trying to fix? I was testing this PR initially against 3.6.5, which didn't have this, and it looks like the ChannelNumber 0 optimization I made here is now throwing an NRE because of it.

Same problem should be occurring on #253.

bording · 2016-08-30T22:16:14Z

I've rebased this branch on top of the lastest vs-support branch changes, and I went ahead and removed the special model 0 handling that was throwing an exception because of the ordering change. Having that in this branch is much less critical.

kjnilsson · 2016-08-31T10:14:10Z

The concourse build failed on this test:

Errors and Failures
1) Failed : RabbitMQ.Client.Unit.TestConsumerOperationDispatch.TestDeliveryOrderingWithSingleChannel
  Expected: 96
  But was:  100
at RabbitMQ.Client.Unit.TestConsumerOperationDispatch.TestDeliveryOrderingWithSingleChannel() in /tmp/build/b95f7b06/rabbitmq_dotnet_client/projects/client/Unit/src/unit/TestConsumerOperationDispatch.cs:line 142

michaelklishin · 2016-08-31T10:15:18Z

@kjnilsson that's likely a timing issue.

kjnilsson · 2016-08-31T10:16:06Z

@michaelklishin yes just wanted to note it as it isn't yet public. Running the tests again.

bording · 2016-08-31T16:17:26Z

This branch is now rebased against master.

For the failing test, yes that is expected on this PR, because the changes to the BatchingWorkPool can't guarantee ordering. Looking at the test and thinking about it more, I can see how this likely isn't an acceptable trade-off, given how acking of multiple delivery tags works.

kjnilsson · 2016-08-31T20:29:46Z

@bording can you explain where the ordering guarantees are given up with your approach?

michaelklishin · 2016-08-31T20:48:29Z

If we really give up the ordering guarantee in this PR, it's a no go compared to #253.

bording · 2016-08-31T21:09:04Z

@kjnilsson Take a look at BatchingWorkPool.AddWorkItem and how it uses the ready queue. It will add an entry in there for every call, so you can end up with multiple NextWorkBlock calls looking at the same model's work.

The existing implementation uses the SetQueue class to prevent the same model from being queued multiple times, but the class isn't thread-safe, and I couldn't find any sort of concurrent version of a collection that had the same behavior.

@Scooletz and I tried to come up with another way to prevent the same model from being queued multiple times, but everything we came up with either required reintroducing locks (and defeating the point of this change) or could potentially run into the opposite problem of having work items and not scheduling a NextWorkBlock call to dispatch them.

When we realized this, that's when we went with a more through redesign of the ConsumerWorkService, as presented in #253.

michaelklishin · 2016-08-31T21:14:34Z

@bording then it sounds like you and @Scooletz consider #253 to be fairly objectively superior to this PR?

bording · 2016-08-31T21:24:44Z

@michaelklishin If ordering is a hard requirement that can't be relaxed, then I think #253 is the only option, unless there's some way we haven't thought of to bring it back to this PR without reintroducing a lock.

I only realized today that it might not be possible to give up ordering because of the multiple delivery tag ack option. We don't use that in our code at all, and I had forgotten that it was even part of the spec. 😢

If ordering wasn't a requirement, than I actually prefer this approach, because it does maintain the ability to pass in a custom task scheduler for concurrency throttling, and since it's not using dedicated threads that block when they don't have work, you see a lot less "synchronization time" when profiling compared to #253.

Given that this approach does have some benefits, that was why we initially presented both options.

michaelklishin · 2016-08-31T21:27:40Z

We cannot give up ordering for backwards compatibility, too. I think the numbers you posted from #253 are very comparable. Not sure how many users provide a custom task scheduler.

kjnilsson · 2016-09-01T08:51:20Z

Ok I've had a play with the PR to see if I can get the TestDeliveryOrderingWithSingleChannel to pass somehow but even after adding back the SetQueue (or SeqQueue behaviour rather) and adding a bunch of relatively course grained locks the test still fails. I suspect something else may be going on that isn't immediately obvious.

I now agree we should proceed with #253 . It does add a breaking change to a public interface (hello v5!) and a fair bit of internal changes as well but the approach is simple and easy to reason about. I don't think that many use a custom TaskScheduler but with a one to one relationship between Models and threads it may not be necessary.
Also it looks like it may allocate less than a creating tasks recursively which would be nice. Once 253 is rebased I will review it properly and test it.

michaelklishin · 2016-09-01T09:02:29Z

Speaking of versions, we should probably release 4.1.0 with it (the API change is not going to affect too many).

michaelklishin · 2016-09-01T13:53:24Z

Closing this in favour of #253 based on a number of discussions in #251, #253 and here.

michaelklishin assigned kjnilsson Aug 26, 2016

bording mentioned this pull request Aug 26, 2016

ConsumerWorkService / BatchingWorkPool implementation is a performance bottleneck #251

Closed

michaelklishin reviewed Aug 26, 2016
View reviewed changes

Scooletz mentioned this pull request Aug 29, 2016

No lock contention at consumer work service #253

Merged

bording force-pushed the improve-BatchingWorkPool branch from f36ca33 to 0694ce8 Compare August 29, 2016 22:39

bording force-pushed the improve-BatchingWorkPool branch from 0694ce8 to c2535fb Compare August 30, 2016 22:13

bording added 2 commits August 31, 2016 11:52

Improve BatchingWorkPool performance

10088c0

Remove special model 0 handling

41a5574

bording force-pushed the improve-BatchingWorkPool branch from c2535fb to 41a5574 Compare August 31, 2016 15:55

michaelklishin closed this Sep 1, 2016

bording deleted the improve-BatchingWorkPool branch October 24, 2016 18:40

Improve batching work pool performance #252

Improve batching work pool performance #252

Uh oh!

Conversation

bording commented Aug 26, 2016

Uh oh!

michaelklishin commented Aug 26, 2016

Uh oh!

michaelklishin Aug 26, 2016

Choose a reason for hiding this comment

Uh oh!

bording Aug 26, 2016

Choose a reason for hiding this comment

Uh oh!

michaelklishin Aug 26, 2016

Choose a reason for hiding this comment

Uh oh!

bording commented Aug 29, 2016

Uh oh!

bording commented Aug 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bording commented Aug 30, 2016

Uh oh!

kjnilsson commented Aug 31, 2016

Uh oh!

michaelklishin commented Aug 31, 2016

Uh oh!

kjnilsson commented Aug 31, 2016

Uh oh!

bording commented Aug 31, 2016

Uh oh!

kjnilsson commented Aug 31, 2016

Uh oh!

michaelklishin commented Aug 31, 2016

Uh oh!

bording commented Aug 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelklishin commented Aug 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bording commented Aug 31, 2016

Uh oh!

michaelklishin commented Aug 31, 2016

Uh oh!

kjnilsson commented Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelklishin commented Sep 1, 2016

Uh oh!

michaelklishin commented Sep 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bording commented Aug 29, 2016 •

edited

Loading

bording commented Aug 31, 2016 •

edited

Loading

michaelklishin commented Aug 31, 2016 •

edited

Loading

kjnilsson commented Sep 1, 2016 •

edited

Loading