Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Apr 6, 2019

Fixes https://github.com/dotnet/corefx/issues/33930

Then a running the SqlClient test suite in managed mode (Linux, osx, uap) and Debug build, an assert fires indicating that an invalid state has been reached. The assertion indicates that a packet has reached the processing stage and has no contents or that the packet has been received in a state where none was expected. The test failure was caused by an assertion but could lead to silent unexpected non-crashing behaviour in release builds leading to unreliability.

After investigation and debugging I've arrived at three related changes.

  • plumb the error message from the packet read function all the way through the handler chain so that the recipients can identify what the error was
  • change the read function so that socket disconnections use an existing but unused error code to identify that the connection has been reset and add specific error catching for disconnections.
  • detect disconnections using the above changes and close the connection gracefully, and change a check for null network packet sources so that unexpected packets in the now closed state are correctly ignored.

The functional and manual tests pass correctly with the affected tests re-enabled.
/cc @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel @saurabh500

Minor changes also included are:
Made the async nested functions static in the packet read so that you don't have to be careful not to refer to outer function variables and degrade performance.
Changed the callback object in mars handle and connection to be strongly typed since it was object and always cast to the concrete type before use.

Wraith2 and others added 2 commits April 6, 2019 19:19
* Update dependencies from https://github.com/dotnet/arcade build 20190405.6

- Microsoft.DotNet.XUnitExtensions - 2.4.0-beta.19205.6
- Microsoft.DotNet.XUnitConsoleRunner - 2.5.1-beta.19205.6
- Microsoft.DotNet.VersionTools.Tasks - 1.0.0-beta.19205.6
- Microsoft.DotNet.ApiCompat - 1.0.0-beta.19205.6
- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19205.6
- Microsoft.DotNet.Build.Tasks.Configuration - 1.0.0-beta.19205.6
- Microsoft.DotNet.Build.Tasks.Feed - 2.2.0-beta.19205.6
- Microsoft.DotNet.Build.Tasks.Packaging - 1.0.0-beta.19205.6
- Microsoft.DotNet.CodeAnalysis - 1.0.0-beta.19205.6
- Microsoft.DotNet.CoreFxTesting - 1.0.0-beta.19205.6
- Microsoft.DotNet.GenAPI - 1.0.0-beta.19205.6
- Microsoft.DotNet.GenFacades - 1.0.0-beta.19205.6
- Microsoft.DotNet.Helix.Sdk - 2.0.0-beta.19205.6
- Microsoft.DotNet.RemoteExecutor - 1.0.0-beta.19205.6
- Microsoft.DotNet.SourceRewriter - 1.0.0-beta.19205.6

* correcting version number metadata
@Wraith2 Wraith2 changed the title SqlClient handle socket network error gradefully SqlClient handle socket network error gracefully Apr 7, 2019
dotnet-maestro bot and others added 27 commits April 7, 2019 14:01
…406.5 (dotnet#36672)

- Microsoft.NETCore.Platforms - 3.0.0-preview5.19206.5
* MovedInteropLocationForBrotli

* Combined Brotli Files
In .NET Core 2.1 we added overloads of Send/ReceiveAsync, and we proactively added CancellationToken arguments to them, but those tokens were only checked at the beginning of the call; if a cancellation request came in after that check, the operation would not be interrupted.

This PR plumbs the token through so that a cancellation request at any point in the operation will cancel that operation.  On Windows we register to use CancelIoEx to request cancellation of the specific overlapped operation on the specific socket.  On Unix we use the existing cancellation infrastructure already in place to support the existing custom queueing scheme.

Some caveats:
- On Windows, canceling a TCP receive will end up canceling all TCP receives pending on that socket, even when we request cancellation of a specific overlapped operation; this is just how cancellation works at the OS level, and there's little we can do about it.  It also shouldn't matter much, as multiple pending receives on the same socket are rare.
- If multiple concurrent receives or multiple concurrent sends are issued on the same socket, only the first will actually be cancelable.  This is because this implementation only plumbs the token through the SocketAsyncEventArgs-based code paths, not the APM based code paths, and currently when using the Task-based APIs, we use the SocketAsyncEventArgs under the covers for only one receive and one send at a time; other receives made while that SAEA receive is in progress or other sends made while that SAEA send is in progress will fall back to using the APM code paths.  This could be addressed in the future in various ways, including a) just using the SAEA code paths for all operations and deleting the APM fallback, or b) plumbing cancellation through APM as well.  However, for now, this approach addresses the primary use case and should be sufficient.
- This only affects code paths to which the CancellationToken passed to Send/ReceiveAsync could reach.  If in the future we add additional overloads taking CancellationToken, we will likely need to plumb it to more places.
* Update dependencies from https://github.com/dotnet/coreclr build 20190403.72

- Microsoft.NET.Sdk.IL - 3.0.0-preview4-27603-72
- Microsoft.NETCore.ILAsm - 3.0.0-preview4-27603-72
- Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview4-27603-72

* Update dependencies from https://github.com/dotnet/coreclr build 20190404.75

- Microsoft.NET.Sdk.IL - 3.0.0-preview4-27604-75
- Microsoft.NETCore.ILAsm - 3.0.0-preview4-27604-75
- Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview4-27604-75

* Update dependencies from https://github.com/dotnet/coreclr build 20190405.73

- Microsoft.NET.Sdk.IL - 3.0.0-preview5-27605-73
- Microsoft.NETCore.ILAsm - 3.0.0-preview5-27605-73
- Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27605-73

* Update dependencies from https://github.com/dotnet/coreclr build 20190406.72

- Microsoft.NET.Sdk.IL - 3.0.0-preview5-27606-72
- Microsoft.NETCore.ILAsm - 3.0.0-preview5-27606-72
- Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27606-72

* Fix argument name in Buffer.BlockCopy test

* Update StreamWriters tests for BaseStream dispose change

* Updated tests for new PtrToStringUTF8 behavior
* Add support for trailing commas with single-segment tests.

* Implement single-segment case and update tests.

* Update multi-segment path and add tests.

* Add more tests, add to state, and update rollback logic.
Consume Serialization Guard API in BinaryFormatter and Process.

Also includes Serialization Guard's tests.
* try to add Alpine ARM64 queue

* remove extra line

* correct queue names

* update queues

* use open queue for CI

* final cleanup

* feedback from review
* Update dependencies from https://github.com/dotnet/arcade build 20190406.5

- Microsoft.DotNet.XUnitExtensions - 2.4.0-beta.19206.5
- Microsoft.DotNet.XUnitConsoleRunner - 2.5.1-beta.19206.5
- Microsoft.DotNet.VersionTools.Tasks - 1.0.0-beta.19206.5
- Microsoft.DotNet.ApiCompat - 1.0.0-beta.19206.5
- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19206.5
- Microsoft.DotNet.Build.Tasks.Configuration - 1.0.0-beta.19206.5
- Microsoft.DotNet.Build.Tasks.Feed - 2.2.0-beta.19206.5
- Microsoft.DotNet.Build.Tasks.Packaging - 1.0.0-beta.19206.5
- Microsoft.DotNet.CodeAnalysis - 1.0.0-beta.19206.5
- Microsoft.DotNet.CoreFxTesting - 1.0.0-beta.19206.5
- Microsoft.DotNet.GenAPI - 1.0.0-beta.19206.5
- Microsoft.DotNet.Helix.Sdk - 2.0.0-beta.19206.5
- Microsoft.DotNet.RemoteExecutor - 1.0.0-beta.19206.5
- Microsoft.DotNet.SourceRewriter - 1.0.0-beta.19206.5

* Update dependencies from https://github.com/dotnet/arcade build 20190407.1

- Microsoft.DotNet.XUnitExtensions - 2.4.0-beta.19207.1
- Microsoft.DotNet.XUnitConsoleRunner - 2.5.1-beta.19207.1
- Microsoft.DotNet.VersionTools.Tasks - 1.0.0-beta.19207.1
- Microsoft.DotNet.ApiCompat - 1.0.0-beta.19207.1
- Microsoft.DotNet.Arcade.Sdk - 1.0.0-beta.19207.1
- Microsoft.DotNet.Build.Tasks.Configuration - 1.0.0-beta.19207.1
- Microsoft.DotNet.Build.Tasks.Feed - 2.2.0-beta.19207.1
- Microsoft.DotNet.Build.Tasks.Packaging - 1.0.0-beta.19207.1
- Microsoft.DotNet.CodeAnalysis - 1.0.0-beta.19207.1
- Microsoft.DotNet.CoreFxTesting - 1.0.0-beta.19207.1
- Microsoft.DotNet.GenAPI - 1.0.0-beta.19207.1
- Microsoft.DotNet.Helix.Sdk - 2.0.0-beta.19207.1
- Microsoft.DotNet.RemoteExecutor - 1.0.0-beta.19207.1
- Microsoft.DotNet.SourceRewriter - 1.0.0-beta.19207.1
* Update APICompat Baselines

* Fix baslines for UAP/UAPAOT

* Fix baseline for UAPAOT
* making corefx compatible with newGenfacadesTask

correcting the new runtime variable

updating genfacades version to latest

fixes uap build and removes extra seed ype preferences for uapaot

* adding build configuration, comments and updating genfacadwa version

* fixing build  for netcoreapaot

* removing netcoreappaot build from all configurations build
* Update coding-style.md

Removed info about corefx.vssettings since it was removed at dotnet#35683

* Update coding-style.md
…t#36718)

* Change charsets to explicitly unicode.

* Remove ActiveIssue.
* Use common .tools dir for global tools
* Update dependencies from https://github.com/dotnet/standard build 20190406.1

- NETStandard.Library - 2.1.0-prerelease.19206.1

* Update dependencies from https://github.com/dotnet/standard build 20190407.1

- NETStandard.Library - 2.1.0-prerelease.19207.1

* Update dependencies from https://github.com/dotnet/standard build 20190408.1

- NETStandard.Library - 2.1.0-prerelease.19208.1
…0408.73 (dotnet#36728)

- Microsoft.NET.Sdk.IL - 3.0.0-preview5-27608-73
- Microsoft.NETCore.ILAsm - 3.0.0-preview5-27608-73
- Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27608-73
* Add initial nullability annotations guidance

* Address PR feedback
*  Add more misc type converters tests

* PR feedback
* Add more ArrayConverter tests and cleanup code

* Disable netfx test
ericstj and others added 12 commits April 17, 2019 18:11
An HttpListener websocket test was failing after the change from PR dotnet#36928. That PR made changes
to tighten up the transition to the Closed state after the closing handshake had completed.

That PR missed some additional changes needed especially in cases where a server (such as loopback)
sends the close frame almost concurrently with the client sending the close frame.

This PR fixes the close frame handshake logic and also makes some optimizations in cases where the
websocket has already transitioned to the Aborted state during the closing handshake. In that case,
the websocket should not wait for a close frame to be received but should simply proceed to closing
the connection.

Fixes #36963
…#36946)

* Fix SocketsHttpHandler response streams to do sync I/O in sync methods

SocketsHttpHandler hands back response Streams for reading response body content.  While we encourage developers to use the async Stream APIs, Stream does expose synchronous APIs, yet the current implementations are just wrapping the async ones and doing sync-over-async.

This fixes the response stream synchronous APIs to be sync down to the underlying networking stream.

It also fixes a couple other minor issues, e.g. Flush{Async} on read-only stream should be nops rather than throwing, we should include an error message about a stream being read-only when trying to write to it, etc.

* Address PR feedback
…dotnet#36961)

* Add an in-box array-backed IBufferWriter<T>

* Update Utf8JsonWriter ref and main writer file.

* Fix JsonWriter WriteValue APIs.

* Use Environment.NewLine static and invert some stream conditions.

* Update JsonWriter properties and fix serializer build breaks.

* Update JsonWriter unit tests.

* Add xml comments, clean up, and improve test coverage.

* Update JsonDocument and JsonSerializer to react to JsonWriter changes.

* Normalize the reference assembly.

* Do not escape/validate comments and update issue number.

* Do not escape comments and validate they don't contain embedded
delimiter

* Remove dead code and update issue number in comments.

* Throw InvalidOEx instead of ArgEx if IBW doesn't give requested memory.

* Fix test build breaks for netfx.

* Remove dead code and fix source packaging.

* Address PR feedback.

* Disable writing floats test on windows

* 8 digit floats don't work well on older TFMs. Reduce to 7.
* handle bad Expires better

* update tests

* internal -> private

* feedback from review
…0417.73 (dotnet#36998)

- Microsoft.NET.Sdk.IL - 3.0.0-preview5-27617-73
- Microsoft.NETCore.ILAsm - 3.0.0-preview5-27617-73
- Microsoft.NETCore.Runtime.CoreCLR - 3.0.0-preview5-27617-73
…0190418.02 (dotnet#36995)

- Microsoft.NETCore.App - 3.0.0-preview5-27618-02
- Microsoft.NETCore.DotNetHost - 3.0.0-preview5-27618-02
- Microsoft.NETCore.DotNetHostPolicy - 3.0.0-preview5-27618-02
…417.12 (dotnet#36996)

- Microsoft.NETCore.Platforms - 3.0.0-preview5.19217.12
- lock the entire Write operation
- Added a test that Completes both reader and writer in the middle of writing a large buffer
@AfsanehR-zz
Copy link
Contributor

As mentioned in #36735 we will wait on merging these PRs once we have the DataAccess performance tests. @Wraith2 I was just wondering if you get a chance to work on porting these tests?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 18, 2019

You suggested it but I wasn't sure how it could be usefully accomplished, I wasn't aware it was blocking unrelated PRs. As I explained in the other thread I wasn't able to reproduce the problem I'd introduced in debug mode so running the tests wouldn't be particularly useful. I can add something which spends a specific time querying on CPUCount*4 threads but it isn't a deterministic test it's straying into mean time to failure testing which isn't what the integration testing is really for.

Aside from the new required test does this code make sense, seem correct and otherwise function?

@saurabh500
Copy link
Contributor

I had a cursory look and it seems good. Will look in depth tomorrow.

@saurabh500
Copy link
Contributor

So what was the inconsistent state that was being reached? I understand that the PR plumbs errors all the way up. What really did this fix?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 19, 2019

When the socket was closed the code was following the path for receipt of a packet but there was no data in the packet so an caught the problem later on. Once that is fixed with the error propagation a corner case can turn up where the callback could be invoked after the connection was in the closed state, this is fixed by the slight change to the null check in the receive method.


// The mars physical connection can get a callback
// with a packet but no result after the connection is closed.
if (source == null && _parser._pMarsPhysicalConObj == this)
Copy link
Contributor

@saurabh500 saurabh500 Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition removal doesn't seem right. This shouldn't be done in Native SNI. In native SNI, all callbacks happen on the same thread and this differentiation of the physical connection object is the only way to differentiate if the callback corresponds to this object.

Also why is this correct for Managed SNI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I've introduced the close method on disconnection the physical connection will now be nulled, so checking if it's the current instance doesn't help. This catches the case where the socket is closed while the socket wait is in progress which causes a 0 packet receive.

The socket in the native version is handled in the sni lib and the same condition doesn't occur there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I've introduced the close method on disconnection the physical connection will now be nulled

What close method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SNIMarsHadle.HandleReceiveError,

if (sniErrorCode == TdsEnums.SNI_WSAECONNRESET)
{
    TdsParser parser = _callbackObject.Parser;
    parser.State = TdsParserState.Broken;
    parser.Connection.BreakConnection();
}

BreakConnection dooms closes and does a ton of other stuff one of which is nulling the state object. Since the only two values that object can have are null and this and if it's either null or this and the source tcs is null it's not going to work the check on the state object seemed redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BreakConnection, only impacts breaking the connection for Managed SNI, this code would never run for Native SNI, and in that case the removal of this check can have unpredictable impact when Native SNI is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then I'm going to have to put that back in and do some debugging and tracing to repo the problem it fixes and see if there's another way to detect it strictly in the managed side. I think if i'd seen one I would have used it so this might be tricky.

/// <param name="callback">Completion callback</param>
public void ReadFromStreamAsync(Stream stream, SNIAsyncCallback callback)
{
// Treat local function as a static and pass all params otherwise as async will allocate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you retain the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can but it's no longer needed since a static can't refer to outer locals, sure you want to keep it?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 20, 2019

I'm going to close this because the git history is a mess, I'm investigating and will rework and open a new PR when it's ready.

@Wraith2 Wraith2 closed this Apr 20, 2019
@Wraith2 Wraith2 deleted the sqlfix-33930 branch April 20, 2019 13:06
@karelz karelz added this to the 3.0 milestone May 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.