-
Notifications
You must be signed in to change notification settings - Fork 4.9k
SqlClient handle socket network error gracefully #36664
Conversation
* Update dependencies from https://github.com/dotnet/arcade build 20190405.6 - Microsoft.DotNet.XUnitExtensions - 2.4.0-beta.19205.6 - Microsoft.DotNet.XUnitConsoleRunner - 2.5.1-beta.19205.6 - Microsoft.DotNet.VersionTools.Tasks - 1.0.0-beta.19205.6 - Microsoft.DotNet.ApiCompat - 1.0.0-beta.19205.6 - Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19205.6 - Microsoft.DotNet.Build.Tasks.Configuration - 1.0.0-beta.19205.6 - Microsoft.DotNet.Build.Tasks.Feed - 2.2.0-beta.19205.6 - Microsoft.DotNet.Build.Tasks.Packaging - 1.0.0-beta.19205.6 - Microsoft.DotNet.CodeAnalysis - 1.0.0-beta.19205.6 - Microsoft.DotNet.CoreFxTesting - 1.0.0-beta.19205.6 - Microsoft.DotNet.GenAPI - 1.0.0-beta.19205.6 - Microsoft.DotNet.GenFacades - 1.0.0-beta.19205.6 - Microsoft.DotNet.Helix.Sdk - 2.0.0-beta.19205.6 - Microsoft.DotNet.RemoteExecutor - 1.0.0-beta.19205.6 - Microsoft.DotNet.SourceRewriter - 1.0.0-beta.19205.6 * correcting version number metadata
…406.5 (dotnet#36672) - Microsoft.NETCore.Platforms - 3.0.0-preview5.19206.5
* MovedInteropLocationForBrotli * Combined Brotli Files
In .NET Core 2.1 we added overloads of Send/ReceiveAsync, and we proactively added CancellationToken arguments to them, but those tokens were only checked at the beginning of the call; if a cancellation request came in after that check, the operation would not be interrupted. This PR plumbs the token through so that a cancellation request at any point in the operation will cancel that operation. On Windows we register to use CancelIoEx to request cancellation of the specific overlapped operation on the specific socket. On Unix we use the existing cancellation infrastructure already in place to support the existing custom queueing scheme. Some caveats: - On Windows, canceling a TCP receive will end up canceling all TCP receives pending on that socket, even when we request cancellation of a specific overlapped operation; this is just how cancellation works at the OS level, and there's little we can do about it. It also shouldn't matter much, as multiple pending receives on the same socket are rare. - If multiple concurrent receives or multiple concurrent sends are issued on the same socket, only the first will actually be cancelable. This is because this implementation only plumbs the token through the SocketAsyncEventArgs-based code paths, not the APM based code paths, and currently when using the Task-based APIs, we use the SocketAsyncEventArgs under the covers for only one receive and one send at a time; other receives made while that SAEA receive is in progress or other sends made while that SAEA send is in progress will fall back to using the APM code paths. This could be addressed in the future in various ways, including a) just using the SAEA code paths for all operations and deleting the APM fallback, or b) plumbing cancellation through APM as well. However, for now, this approach addresses the primary use case and should be sufficient. - This only affects code paths to which the CancellationToken passed to Send/ReceiveAsync could reach. If in the future we add additional overloads taking CancellationToken, we will likely need to plumb it to more places.
* Update dependencies from https://github.com/dotnet/coreclr build 20190403.72 - Microsoft.NET.Sdk.IL - 3.0.0-preview4-27603-72 - Microsoft.NETCore.ILAsm - 3.0.0-preview4-27603-72 - Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview4-27603-72 * Update dependencies from https://github.com/dotnet/coreclr build 20190404.75 - Microsoft.NET.Sdk.IL - 3.0.0-preview4-27604-75 - Microsoft.NETCore.ILAsm - 3.0.0-preview4-27604-75 - Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview4-27604-75 * Update dependencies from https://github.com/dotnet/coreclr build 20190405.73 - Microsoft.NET.Sdk.IL - 3.0.0-preview5-27605-73 - Microsoft.NETCore.ILAsm - 3.0.0-preview5-27605-73 - Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27605-73 * Update dependencies from https://github.com/dotnet/coreclr build 20190406.72 - Microsoft.NET.Sdk.IL - 3.0.0-preview5-27606-72 - Microsoft.NETCore.ILAsm - 3.0.0-preview5-27606-72 - Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27606-72 * Fix argument name in Buffer.BlockCopy test * Update StreamWriters tests for BaseStream dispose change * Updated tests for new PtrToStringUTF8 behavior
* Add support for trailing commas with single-segment tests. * Implement single-segment case and update tests. * Update multi-segment path and add tests. * Add more tests, add to state, and update rollback logic.
…so we get automatic updates from arcade. (dotnet#36685)
Consume Serialization Guard API in BinaryFormatter and Process. Also includes Serialization Guard's tests.
* try to add Alpine ARM64 queue * remove extra line * correct queue names * update queues * use open queue for CI * final cleanup * feedback from review
* Update dependencies from https://github.com/dotnet/arcade build 20190406.5 - Microsoft.DotNet.XUnitExtensions - 2.4.0-beta.19206.5 - Microsoft.DotNet.XUnitConsoleRunner - 2.5.1-beta.19206.5 - Microsoft.DotNet.VersionTools.Tasks - 1.0.0-beta.19206.5 - Microsoft.DotNet.ApiCompat - 1.0.0-beta.19206.5 - Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19206.5 - Microsoft.DotNet.Build.Tasks.Configuration - 1.0.0-beta.19206.5 - Microsoft.DotNet.Build.Tasks.Feed - 2.2.0-beta.19206.5 - Microsoft.DotNet.Build.Tasks.Packaging - 1.0.0-beta.19206.5 - Microsoft.DotNet.CodeAnalysis - 1.0.0-beta.19206.5 - Microsoft.DotNet.CoreFxTesting - 1.0.0-beta.19206.5 - Microsoft.DotNet.GenAPI - 1.0.0-beta.19206.5 - Microsoft.DotNet.Helix.Sdk - 2.0.0-beta.19206.5 - Microsoft.DotNet.RemoteExecutor - 1.0.0-beta.19206.5 - Microsoft.DotNet.SourceRewriter - 1.0.0-beta.19206.5 * Update dependencies from https://github.com/dotnet/arcade build 20190407.1 - Microsoft.DotNet.XUnitExtensions - 2.4.0-beta.19207.1 - Microsoft.DotNet.XUnitConsoleRunner - 2.5.1-beta.19207.1 - Microsoft.DotNet.VersionTools.Tasks - 1.0.0-beta.19207.1 - Microsoft.DotNet.ApiCompat - 1.0.0-beta.19207.1 - Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19207.1 - Microsoft.DotNet.Build.Tasks.Configuration - 1.0.0-beta.19207.1 - Microsoft.DotNet.Build.Tasks.Feed - 2.2.0-beta.19207.1 - Microsoft.DotNet.Build.Tasks.Packaging - 1.0.0-beta.19207.1 - Microsoft.DotNet.CodeAnalysis - 1.0.0-beta.19207.1 - Microsoft.DotNet.CoreFxTesting - 1.0.0-beta.19207.1 - Microsoft.DotNet.GenAPI - 1.0.0-beta.19207.1 - Microsoft.DotNet.Helix.Sdk - 2.0.0-beta.19207.1 - Microsoft.DotNet.RemoteExecutor - 1.0.0-beta.19207.1 - Microsoft.DotNet.SourceRewriter - 1.0.0-beta.19207.1
* Update APICompat Baselines * Fix baslines for UAP/UAPAOT * Fix baseline for UAPAOT
* making corefx compatible with newGenfacadesTask correcting the new runtime variable updating genfacades version to latest fixes uap build and removes extra seed ype preferences for uapaot * adding build configuration, comments and updating genfacadwa version * fixing build for netcoreapaot * removing netcoreappaot build from all configurations build
* Update coding-style.md Removed info about corefx.vssettings since it was removed at dotnet#35683 * Update coding-style.md
…e the readonly members feature. (dotnet#36702)
…t#36718) * Change charsets to explicitly unicode. * Remove ActiveIssue.
* Use common .tools dir for global tools
* Update dependencies from https://github.com/dotnet/standard build 20190406.1 - NETStandard.Library - 2.1.0-prerelease.19206.1 * Update dependencies from https://github.com/dotnet/standard build 20190407.1 - NETStandard.Library - 2.1.0-prerelease.19207.1 * Update dependencies from https://github.com/dotnet/standard build 20190408.1 - NETStandard.Library - 2.1.0-prerelease.19208.1
…0408.73 (dotnet#36728) - Microsoft.NET.Sdk.IL - 3.0.0-preview5-27608-73 - Microsoft.NETCore.ILAsm - 3.0.0-preview5-27608-73 - Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27608-73
* Add initial nullability annotations guidance * Address PR feedback
* Add more misc type converters tests * PR feedback
* Add more ArrayConverter tests and cleanup code * Disable netfx test
An HttpListener websocket test was failing after the change from PR dotnet#36928. That PR made changes to tighten up the transition to the Closed state after the closing handshake had completed. That PR missed some additional changes needed especially in cases where a server (such as loopback) sends the close frame almost concurrently with the client sending the close frame. This PR fixes the close frame handshake logic and also makes some optimizations in cases where the websocket has already transitioned to the Aborted state during the closing handshake. In that case, the websocket should not wait for a close frame to be received but should simply proceed to closing the connection. Fixes #36963
…#36946) * Fix SocketsHttpHandler response streams to do sync I/O in sync methods SocketsHttpHandler hands back response Streams for reading response body content. While we encourage developers to use the async Stream APIs, Stream does expose synchronous APIs, yet the current implementations are just wrapping the async ones and doing sync-over-async. This fixes the response stream synchronous APIs to be sync down to the underlying networking stream. It also fixes a couple other minor issues, e.g. Flush{Async} on read-only stream should be nops rather than throwing, we should include an error message about a stream being read-only when trying to write to it, etc. * Address PR feedback
…dotnet#36961) * Add an in-box array-backed IBufferWriter<T> * Update Utf8JsonWriter ref and main writer file. * Fix JsonWriter WriteValue APIs. * Use Environment.NewLine static and invert some stream conditions. * Update JsonWriter properties and fix serializer build breaks. * Update JsonWriter unit tests. * Add xml comments, clean up, and improve test coverage. * Update JsonDocument and JsonSerializer to react to JsonWriter changes. * Normalize the reference assembly. * Do not escape/validate comments and update issue number. * Do not escape comments and validate they don't contain embedded delimiter * Remove dead code and update issue number in comments. * Throw InvalidOEx instead of ArgEx if IBW doesn't give requested memory. * Fix test build breaks for netfx. * Remove dead code and fix source packaging. * Address PR feedback. * Disable writing floats test on windows * 8 digit floats don't work well on older TFMs. Reduce to 7.
* handle bad Expires better * update tests * internal -> private * feedback from review
…90417.2 (dotnet#36997) - NETStandard.Library - 2.1.0-prerelease.19217.2
…0417.73 (dotnet#36998) - Microsoft.NET.Sdk.IL - 3.0.0-preview5-27617-73 - Microsoft.NETCore.ILAsm - 3.0.0-preview5-27617-73 - Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27617-73
…0190418.02 (dotnet#36995) - Microsoft.NETCore.App - 3.0.0-preview5-27618-02 - Microsoft.NETCore.DotNetHost - 3.0.0-preview5-27618-02 - Microsoft.NETCore.DotNetHostPolicy - 3.0.0-preview5-27618-02
…417.12 (dotnet#36996) - Microsoft.NETCore.Platforms - 3.0.0-preview5.19217.12
- lock the entire Write operation - Added a test that Completes both reader and writer in the middle of writing a large buffer
|
You suggested it but I wasn't sure how it could be usefully accomplished, I wasn't aware it was blocking unrelated PRs. As I explained in the other thread I wasn't able to reproduce the problem I'd introduced in debug mode so running the tests wouldn't be particularly useful. I can add something which spends a specific time querying on CPUCount*4 threads but it isn't a deterministic test it's straying into mean time to failure testing which isn't what the integration testing is really for. Aside from the new required test does this code make sense, seem correct and otherwise function? |
|
I had a cursory look and it seems good. Will look in depth tomorrow. |
|
So what was the inconsistent state that was being reached? I understand that the PR plumbs errors all the way up. What really did this fix? |
|
When the socket was closed the code was following the path for receipt of a packet but there was no data in the packet so an caught the problem later on. Once that is fixed with the error propagation a corner case can turn up where the callback could be invoked after the connection was in the closed state, this is fixed by the slight change to the null check in the receive method. |
|
|
||
| // The mars physical connection can get a callback | ||
| // with a packet but no result after the connection is closed. | ||
| if (source == null && _parser._pMarsPhysicalConObj == this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition removal doesn't seem right. This shouldn't be done in Native SNI. In native SNI, all callbacks happen on the same thread and this differentiation of the physical connection object is the only way to differentiate if the callback corresponds to this object.
Also why is this correct for Managed SNI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I've introduced the close method on disconnection the physical connection will now be nulled, so checking if it's the current instance doesn't help. This catches the case where the socket is closed while the socket wait is in progress which causes a 0 packet receive.
The socket in the native version is handled in the sni lib and the same condition doesn't occur there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I've introduced the close method on disconnection the physical connection will now be nulled
What close method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SNIMarsHadle.HandleReceiveError,
if (sniErrorCode == TdsEnums.SNI_WSAECONNRESET)
{
TdsParser parser = _callbackObject.Parser;
parser.State = TdsParserState.Broken;
parser.Connection.BreakConnection();
}BreakConnection dooms closes and does a ton of other stuff one of which is nulling the state object. Since the only two values that object can have are null and this and if it's either null or this and the source tcs is null it's not going to work the check on the state object seemed redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BreakConnection, only impacts breaking the connection for Managed SNI, this code would never run for Native SNI, and in that case the removal of this check can have unpredictable impact when Native SNI is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then I'm going to have to put that back in and do some debugging and tracing to repo the problem it fixes and see if there's another way to detect it strictly in the managed side. I think if i'd seen one I would have used it so this might be tricky.
| /// <param name="callback">Completion callback</param> | ||
| public void ReadFromStreamAsync(Stream stream, SNIAsyncCallback callback) | ||
| { | ||
| // Treat local function as a static and pass all params otherwise as async will allocate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you retain the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can but it's no longer needed since a static can't refer to outer locals, sure you want to keep it?
src/System.Data.SqlClient/src/System/Data/SqlClient/SNI/SNIMarsHandle.cs
Show resolved
Hide resolved
|
I'm going to close this because the git history is a mess, I'm investigating and will rework and open a new PR when it's ready. |
Fixes https://github.com/dotnet/corefx/issues/33930
Then a running the SqlClient test suite in managed mode (Linux, osx, uap) and Debug build, an assert fires indicating that an invalid state has been reached. The assertion indicates that a packet has reached the processing stage and has no contents or that the packet has been received in a state where none was expected. The test failure was caused by an assertion but could lead to silent unexpected non-crashing behaviour in release builds leading to unreliability.
After investigation and debugging I've arrived at three related changes.
The functional and manual tests pass correctly with the affected tests re-enabled.
/cc @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel @saurabh500
Minor changes also included are:
Made the async nested functions static in the packet read so that you don't have to be careful not to refer to outer function variables and degrade performance.
Changed the callback object in mars handle and connection to be strongly typed since it was object and always cast to the concrete type before use.