Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Apr 24, 2019

relates to https://github.com/dotnet/corefx/issues/30430 and https://github.com/dotnet/corefx/issues/25132

Profiling shows that getting sql connections out of the pool can hit hot locks and potentially limit maximum throughput. I've used the DataAccessPerformance benchmark based on the TechEmpower benchmarks available here to review and make some small changes which roughly halve the contention.

before:
dataaccessperf-master

after:
dataaccessperf-locking

  • In DbConnectionFactory I changed from using lock to a ReaderWriterLockSlim which completely eliminates Set_Connection from the contention in this benchmark.
  • In DbConnectionPoolGroup I've changed the state transition to be lock free which will matter when pooling is not enabled. I've attempted to reduce the work done under the lock when creating a pool but I don't believe this can be eliminated entirely, if you don't start the pool inside the lock you may end up in a highly contended scenario starting a large number of pools which will then try to create a lot of connections each which would be a massive drain on resources when only one of those pools can be used. The Startup() call must be called as little as possible from when I can see.
  • Some of the ReadAsync and ExecuteDbReaderAsync will be improved with another PR I have in the pipeline

Functional manual and load testing with the benchmark all pass without problems. This is all about locking though so please review carefully.
/cc all the area people @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel and interested people @saurabh500 @roji

@roji
Copy link
Member

roji commented Apr 24, 2019

Am a total outsider to SqlClient, so take the following with a very large grain of salt:

  • DbConnectionFactory.GetConnectionPoolGroup(): the changes here seem to affect the scenario when a connection pool group isn't found. Doesn't this only happen once per pool key, which is supposed to be very rare? If so, I wouldn't expect the locking optimization inside to have any effect...
  • Assuming I've misunderstood and the method is on a perf hot path... The method seems to copy the entire _connectionPoolGroups to a new instance, add the new group to it and assign the new dictionary back to _connectionPoolGroups. I'm wondering why it's not possible to simply add the new group (and simply use ConcurrentDictionary?).
  • If I've understood the above and we need to copy and replace the dictionary every time, then it seems like a classical case for a simple CAS loop (so use Interlocked.CompareAndExchange() repeatedly until replacing the dictionary works. @Wraith2 if you don't feel comfortable with this kind of programming I can submit a PR doing that.
  • In fact, it would seem that the code as it is (and after this PR) has a race condition, and the "disabled pool entry discovered" assertion could be tripped... Simply assigning _connectionPoolGroups doesn't ensure that the new copy is seen by threads running on other cores. So two threads may have already failed to find the key in the first TryGetValue check; one takes the lock and adds the new entry, and the other one finds it in the inner TryGetValue and trips the assert. In release build the logic seems OK, but the whole thing is over-elaborate and seems like it would be much simpler with a CAS loop.

@roji
Copy link
Member

roji commented Apr 24, 2019

One additional issue with this PR: _connectionPoolGroups is also accessed from PruneConnectionPoolGroups(), which hasn't been modified and continues to lock this - whatever locking changes are done to GetConnectionPool() need to be done there too...

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 24, 2019

DbConnectionFactory.GetConnectionPoolGroup(): the changes here seem to affect the scenario when a connection pool group isn't found. Doesn't this only happen once per pool key, which is supposed to be very rare? If so, I wouldn't expect the locking optimization inside to have any effect...

When you get this contention is startup going from no new connections to as many as you can and this lock is what prevents every connection through that pool until it's been created. Allowing every caller to try and setup a pool and letting the first assigner win would spin up huge numbers of connections so you really do need the lock but I've tried to minimize what's done under it.

You're right on the connection groups assignment, the lock was providing a barrier and I've neglected to add one. I've changed it to use Interlocked for the assignment.

@roji
Copy link
Member

roji commented Apr 24, 2019

@Wraith2 so this is about going from zero connections to one connection? Because it seems like once the pool group is created for a given key we'd never go through that code again?

Regarding doing CAS (or ConcurrentDictionary), are you saying that the creation of the DbConnectionPoolGroup itself creates connections, i.e. that we should avoid instantiating DbConnectionPoolGroup and throwing them away?

BTW my previous comment about the race condition was wrong, since the assignment occurs within the lock, so the change does get propagated. I'm not 100% sure about ReadWriteLockSlim, so making sure is a good idea.

@roji
Copy link
Member

roji commented Apr 24, 2019

Regarding doing CAS (or ConcurrentDictionary), are you saying that the creation of the DbConnectionPoolGroup itself creates connections, i.e. that we should avoid instantiating DbConnectionPoolGroup and throwing them away?

Exploring the code a bit, I don't see any issue with a CAS loop which would create a new DbConnectionPoolGroup and possibly throw it away:

  • The constructor for DbConnectionPoolGroup does almost nothing
  • CreateConnectionPoolGroupProviderInfo() also seems very light (the SqlClient override)

Other than that nothing actually seems to be happening...

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 24, 2019

Regarding doing CAS (or ConcurrentDictionary), are you saying that the creation of the DbConnectionPoolGroup itself creates connections, i.e. that we should avoid instantiating DbConnectionPoolGroup and throwing them away?

DbConnectionPoolGroup.GetConnectionPool
Creating a pool sets up some events a semaphore and some arrays. not trivial but all local. Once you call Startup the pool will check if it needs to be populated and may initiate connections (only 1 to start with since there can't be waiting requests) but since the number of concurrent callers to the method is unbounded (not practically but there's no reason it couldn't be absurdly high) you could end up with a large number of callers initiating connections and then all but one of them would lose the race and have to shut them down which will consume sockets and pinvoke time in both directions. The original code was avoiding this by creating the pool under the lock which isn't terrible but if you're ok with losing a few objects you can do it outside and make the contention period overall smaller. The effect is small so if it's not satisfactory we can always just drop this change.

You can't use CompareExchange on this because you can't get a ref to the internals of the dictionary. ConcurrentDictionary is pretty well optimized so I'm not too worried about it's perf, it'll be fast if it can.

DbConnectionFactory.GetConnectionPoolGroup
It does an outer check to see if it can find a pool that isn't disabled. If it can't it later does an inner check to see if it can find a pool and at this point we know should be disabled so if we find one we can transfer some information out of it into a new pool that will be enabled. Why? without knowing the reason the code was written like this I couldn't say but I interpret it as deliberate not poor so lacking information on the use case I'm being conservative about making changes to it.

The prune call does still lock but only with itself and if you're hitting prune you're not at high contention on that pool because something in it has been allowed to sit idle for long enough to be pruned.

@roji
Copy link
Member

roji commented Apr 24, 2019

@Wraith2 I admit I haven't looked at DbConnectionPoolGroup.GetConnectionPool or the changes you made to it (I haven't had time). I'm only referring to DbConnectionFactory.GetConnectionPoolGroup.

It does an outer check to see if it can find a pool that isn't disabled. If it can't it later does an inner check to see if it can find a pool and at this point we know should be disabled so if we find one we can transfer some information out of it into a new pool that will be enabled. Why? without knowing the reason the code was written like this I couldn't say but I interpret it as deliberate not poor so lacking information on the use case I'm being conservative about making changes to it.

Well, this specific dictionary is private, so relatively limited in scope and safer to refactor. I'd need to look more in-depth tomorrow, but at this point I can't really see any reason not to replace DbConnectionPoolGroup._connectionPoolGroups with a ConcurrentDictionary and remove all this complicated and lock-heavy code. @saurabh500

The prune call does still lock but only with itself and if you're hitting prune you're not at high contention on that pool because something in it has been allowed to sit idle for long enough to be pruned.

The point is that before your PR, both prune and GetConnectionPoolGroup() locked on this (so could not run concurrently). Your PR changes GetConnectionPoolGroup() to use a different locking mechanism, but leaves prune as-is, so both these methods can now execute concurrently and interfere with one another. For example, pruning could start copying the dictionary just as GetConnectionPoolGroup() is adding a new pool group, which would be missed by prune and therefore forgotten when it assigns the new dictionary.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 24, 2019

Your PR changes GetConnectionPoolGroup() to use a different locking mechanism, but leaves prune as-is,

No, I changed prune to use the same reader writer lock so they're still mutually exclusive in the same region they were before but now there can be multiple readers in a section where only one was allowed before and the lock is escalated if an assignment is needed, which is rare.

@roji
Copy link
Member

roji commented Apr 24, 2019

Your PR changes GetConnectionPoolGroup() to use a different locking mechanism, but leaves prune as-is,

No, I changed prune to use the same reader writer lock so they're still mutually exclusive in the same region they were before but now there can be multiple readers in a section where only one was allowed before and the lock is escalated if an assignment is needed, which is rare.

Sorry, my bad! I somehow missed the change in prune, reading code late at night can lead to that...

I'll try to have another look at the the two types tomorrow and see what I can figure out.

@saurabh500
Copy link
Contributor

There is indeed a Prune as well in DbConnectionPoolGroup.cs which uses a lock. I wonder if the confusion arose from that vs PruneConnectionPoolGroups that @Wraith2 actually changed.

@roji
Copy link
Member

roji commented Apr 25, 2019

I looked at this again with fresh eyes, and would like to come back to what I wrote above:

DbConnectionFactory.GetConnectionPoolGroup(): the changes here seem to affect the scenario when a connection pool group isn't found. Doesn't this only happen once per pool key, which is supposed to be very rare? If so, I wouldn't expect the locking optimization inside to have any effect...

When you get this contention is startup going from no new connections to as many as you can and this lock is what prevents every connection through that pool until it's been created. Allowing every caller to try and setup a pool and letting the first assigner win would spin up huge numbers of connections so you really do need the lock but I've tried to minimize what's done under it.

So this optimization only affects the very first call, when a pool doesn't yet exist for the given pool key (and user identity) - so it would have no effect whatsoever on application steady performance. In addition, from what I can tell, time spent within these locks is quite negligible - no actual connections are being created (only DbConnectionPoolGroup and DbConnectionPool), and no I/O is being performed. @Wraith2, am I correct to assume that the DotTrace snapshots you posted above are the result of a run opening a single connection? In other words, your test scenario doesn't perform any iterations opening many connections?

Assuming I haven't grossly misunderstood things, it doesn't seem like this code specifically is worth optimizing - it will typically execute once in the application's lifetime (modulu pruning, which should also be rare), and we wouldn't see any actual effect on perf. We should instead concentrate on the hot path, where all the pooling data structures already exist, and a connection is being allocated (and returned!) to the pool.

It may be worth refactoring this code to simplify it - as I wrote above, it seems over-engineered and overly-complex for what it's doing. On the other hand, it's also working well and maybe it's better to just leave it alone.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 25, 2019

am I correct to assume that the DotTrace snapshots you posted above are the result of a run opening a single connection? In other words, your test scenario doesn't perform any iterations opening many connections?

All my numbers are from DataAccessPerformance 64 threads (on 16 cores, so 4x contention) for 60 seconds. I'm assuming the contention is all at the start and will affect time to first query because as you say I can't see why any of this would contend once the pools and groups are setup.

It may be worth refactoring this code to simplify it - as I wrote above, it seems over-engineered and overly-complex for what it's doing. On the other hand, it's also working well and maybe it's better to just leave it alone.

I was thinking about this last night and I have a theory. I suspect that the person that wrote it was an sql server internals expert and that what they've implemented is equivalent to an online index rebuild. It takes a copy of the data works on it and then does an atomic replace to ensure that the offline/lock time is minimized.

As I mentioned elsewhere this is the horror of the pooling code. It's all like this. There are gains to be had but it's very hard to see them and reason about their efficacy or safety. In general I agree with you that if it isn't a clear benefit then it should be left alone. So the changes in DbConnectionPoolGroup.GetConnectionPool might not be worth the effort because they're startup cost, but DbConnectionFactory.GetConnectionPoolGroup reader lock changes should be useful because they're on the common path of ConnectionString_Set which is used whenever a connection is created.

@stephentoub
Copy link
Member

should be useful

Can you share a benchmark that demonstrates a measurable throughput win?

@roji
Copy link
Member

roji commented Apr 25, 2019

but DbConnectionFactory.GetConnectionPoolGroup reader lock changes should be useful because they're on the common path of ConnectionString_Set which is used whenever a connection is created.

If I'm reading the code correctly, in the fast path (once everything is set up) the first TryGetValue() (the one outside the lock) should always succeed, so the code in question wouldn't be reached...

@stephentoub do we have any infrastructure for doing BDN in corefx? I can't find anything in particular, it would help grealy in cases such as this.

@stephentoub
Copy link
Member

do we have any infrastructure for doing BDN in corefx? I can't find anything in particular, it would help grealy in cases such as this.

We have the whole dotnet/performance repo for exactly this purpose.
cc: @adamsitnik

@roji
Copy link
Member

roji commented Apr 25, 2019

@stephentoub OK, thanks - was vaguely aware of it, will investigate and see with @Wraith2 what can be done for SqlClient.

@roji
Copy link
Member

roji commented Apr 25, 2019

Oh great - there's a feature for running a locally-built corefx, perfect.

@Wraith2, so as part of any optimization work, it would be ideal to add a benchmark into https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/corefx (creating a System.Data.SqlClient there), which demonstrates exactly what it is we're trying to optimize. Then we'd be able to see (and show) before and after results for the change, justifying it. I'll be happy to assist you if you're new to BenchmarkDotNet (it's really quite good).

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 25, 2019

Can you share a benchmark that demonstrates a measurable throughput win?

I've got local numbers that show a measurable but not very significant throughput increase of ~0.8%, but I'm not supposed to share them publicly because of sql server license conditions. The scale at which these changes are going to make a useful difference aren't something I can recreate sensibly on a single machine.

@saurabh500
Copy link
Contributor

So this optimization only affects the very first call, when a pool doesn't yet exist for the given pool key (and user identity) - so it would have no effect whatsoever on application steady performance.

@Wraith2 if data access benchmark is run to measure TPS after SQlClient has warmed up the connection pool with a few connections, will there still be any benefit from this PR?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 25, 2019

I've got the timeline profiler api integrated now so I can exclude the warmup (which was 3s all along). And no,

Steady state contention is 60ms and it's coming from task machinery doing reflection and DbConnectionPool.PrepareConnection which I investigated and decided to leave alone due to all the careful comments in the code about the locking.
Capture

If you run it on a hardware and contention configuration closer to the TE benchmark you may get different results.

@roji
Copy link
Member

roji commented Apr 25, 2019

Steady state contention is 60ms and it's coming from task machinery doing reflection and DbConnectionPool.PrepareConnection which I investigated and decided to leave alone due to all the careful comments in the code about the locking.

That sounds like a promising direction. It seems that the lock is taken in DbConnectionPool.PrepareConnection to "prevent race conditions with Clear and ReclaimEmancipatedObjects" (PostPop). I'm guessing that a more general refactor is needed of how Clear/ReclaimEmancipatedObjects actually work, in order to remove the need for locking here - it's a shame to introduce locking contention at pooled open just because a pool clear may be happening at the same time.

Obviously any reflection and superfluous Task machinery is also a good target.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 25, 2019

I think we're in general agreement that these changes aren't particularly useful other than startup which is a rare occurrence. If so I'll close and have a longer look at the only real issue we've identified which is PrepareConnection.

On the topic of benching with BDN. a lot of my previous PR results were based on https://github.com/Wraith2/SqlBench/blob/master/Program.cs which is a project setup for benching and profiling very focussed simple cases. It could be generalised to use any ADO driver and serve as a base to get some ongoing internal numbers from.

@roji
Copy link
Member

roji commented Apr 25, 2019

I think we're in general agreement that these changes aren't particularly useful other than startup which is a rare occurrence. If so I'll close and have a longer look at the only real issue we've identified which is PrepareConnection.

Makes sense to me.

On the topic of benching with BDN. a lot of my previous PR results were based on https://github.com/Wraith2/SqlBench/blob/master/Program.cs which is a project setup for benching and profiling very focussed simple cases. It could be generalised to use any ADO driver and serve as a base to get some ongoing internal numbers from.

Great that you're already experienced with BDN - I never do any sort of optimization work without a BDN benchmark showing the difference.

Npgsql has a BDN suite - not perfect or very comprehensive, but it does benchmark some pretty common scenarios. For example, at some point, when optimizing the open mechanism (including some pooling work) I was working with this trivial benchmark.

@Wraith2 Wraith2 closed this Apr 25, 2019
@Wraith2 Wraith2 deleted the sqlperf-locking branch April 25, 2019 19:18
@karelz karelz added this to the 3.0 milestone May 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants