Skip to content

Conversation

@rzikm
Copy link
Member

@rzikm rzikm commented May 18, 2022

This PR fixes a data race which we occasionally hit when running Http.Funcional.Tests in a tight loop.

The data race this PR fixes occurs like this:

  • MsQuicThread indicates that our write side was aborted by the peer (in HandleEventPeerRecvAborted). And sets _state.SendState = SendState.Aborted
  • Application thread attempts to send data on the stream and in SetupWriteStartState, we check the _state.SendState == SendState.Aborted. This is done outside the lock to avoid locking when stream is in a terminal state.
  • Application thread continues and reads _state.SendErrorCode which hasn't been set yet by the MsQuic thread -> we throw QuicOperationAbortedException instead of QuicStreamAbortedException with appropriate error code.
Example test failure due to the race
Failed System.Net.Http.Functional.Tests.SocketsHttpHandlerTest_Http3_MsQuic.RequestSentResponseDisposed_ThrowsOnServer [43 ms]
  Error Message:
   System.AggregateException : One or more errors occurred. (Sending has already been aborted on the stream) (Sending has already been aborted on the stream)
---- System.Net.Quic.QuicOperationAbortedException : Sending has already been aborted on the stream
---- System.Net.Quic.QuicOperationAbortedException : Sending has already been aborted on the stream
  Stack Trace:
     at System.Threading.Tasks.TaskTimeoutExtensions.WhenAllOrAnyFailed(Task[] tasks) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs:line 88
   at System.Threading.Tasks.TaskTimeoutExtensions.WhenAllOrAnyFailed(Task[] tasks, Int32 millisecondsTimeout) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs:line 55
   at System.Net.Http.Functional.Tests.HttpClientHandlerTest_Http3.RequestSentResponseDisposed_ThrowsOnServer() in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Http/tests/FunctionalTests/HttpClientHandlerTest.Http3.cs:line 367
--- End of stack trace from previous location ---
----- Inner Stack Trace #1 (System.Net.Quic.QuicOperationAbortedException) -----
   at System.Net.Quic.Implementations.MsQuic.MsQuicStream.SetupWriteStartState(Boolean emptyBuffer, CancellationToken cancellationToken) in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs:line 342
   at System.Net.Quic.Implementations.MsQuic.MsQuicStream.WriteAsync(ReadOnlyMemory`1 buffer, Boolean endStream, CancellationToken cancellationToken) in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs:line 307
   at System.Net.Test.Common.Http3LoopbackStream.SendFrameHeaderAsync(Int64 frameType, Int32 payloadLength) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 162
   at System.Net.Test.Common.Http3LoopbackStream.SendFrameAsync(Int64 frameType, ReadOnlyMemory`1 framePayload) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 167
   at System.Net.Test.Common.Http3LoopbackStream.SendDataFrameAsync(ReadOnlyMemory`1 data) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 140
   at System.Net.Test.Common.Http3LoopbackStream.SendResponseBodyAsync(Byte[] content, Boolean isFinal) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 283
   at System.Net.Http.Functional.Tests.HttpClientHandlerTest_Http3.<>c__DisplayClass8_0.<<RequestSentResponseDisposed_ThrowsOnServer>b__0>d.MoveNext() in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Http/tests/FunctionalTests/HttpClientHandlerTest.Http3.cs:line 332
--- End of stack trace from previous location ---
   at System.Threading.Tasks.TaskTimeoutExtensions.GetRealException(Task task) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs:line 120
----- Inner Stack Trace #2 (System.Net.Quic.QuicOperationAbortedException) -----
   at System.Net.Quic.Implementations.MsQuic.MsQuicStream.SetupWriteStartState(Boolean emptyBuffer, CancellationToken cancellationToken) in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs:line 342
   at System.Net.Quic.Implementations.MsQuic.MsQuicStream.WriteAsync(ReadOnlyMemory`1 buffer, Boolean endStream, CancellationToken cancellationToken) in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs:line 307
   at System.Net.Test.Common.Http3LoopbackStream.SendFrameHeaderAsync(Int64 frameType, Int32 payloadLength) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 162
   at System.Net.Test.Common.Http3LoopbackStream.SendFrameAsync(Int64 frameType, ReadOnlyMemory`1 framePayload) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 167
   at System.Net.Test.Common.Http3LoopbackStream.SendDataFrameAsync(ReadOnlyMemory`1 data) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 140
   at System.Net.Test.Common.Http3LoopbackStream.SendResponseBodyAsync(Byte[] content, Boolean isFinal) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Net/Http/Http3LoopbackStream.cs:line 283
   at System.Net.Http.Functional.Tests.HttpClientHandlerTest_Http3.<>c__DisplayClass8_0.<<RequestSentResponseDisposed_ThrowsOnServer>b__0>d.MoveNext() in /home/rzikm/dotnet-runtime/src/libraries/System.Net.Http/tests/FunctionalTests/HttpClientHandlerTest.Http3.cs:line 332
--- End of stack trace from previous location ---
   at System.Threading.Tasks.TaskTimeoutExtensions.GetRealException(Task task) in /home/rzikm/dotnet-runtime/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs:line 120

@ghost ghost added the area-System.Net.Quic label May 18, 2022
@ghost ghost assigned rzikm May 18, 2022
@ghost
Copy link

ghost commented May 18, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR fixes a data race which we occasionally hit when running Http.Funcional.Tests in a tight loop.

The data race this PR fixes occurs like this:

  • MsQuicThread indicates that our write side was aborted by the peer (in HandleEventPeerRecvAborted). And sets _state.SendState = SendState.Aborted
  • Application thread attempts to send data on the stream and in SetupWriteStartState, we check the _state.SendState == SendState.Aborted. This is done outside the lock to avoid locking when stream is in a terminal state.
  • Application thread continues and reads _state.SendErrorCode which hasn't been set yet by the MsQuic thread -> we throw QuicOperationAbortedException instead of QuicStreamAbortedException with appropriate error code.
Author: rzikm
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@rzikm rzikm requested review from CarnaViire and ManickaP and removed request for ManickaP May 18, 2022 10:53
@rzikm
Copy link
Member Author

rzikm commented May 18, 2022

To consider:

  • is there any perf penalty for using Volatile.Read? cc: @stephentoub
  • alternatively, we can consider removing the outside-of-the-lock check. That in turn will require us to make sure that the CancellationTokenRegistration is properly disposed if we throw later.

@rzikm rzikm changed the title Fix data race on MsQuicStream abort Fix data race on incoming MsQuicStream abort May 18, 2022
@stephentoub
Copy link
Member

is there any perf penalty for using Volatile.Read?

On x86/64, the memory model of the hardware is strong enough that no fences are emitted. The only impact is potentially prohibiting certain optimizations the JIT might otherwise do. Generally it's negligible.

On ARM, the hardware has a weaker memory model and the JIT does need to emit a fence in most situations.

state.SendErrorCode = (long)streamEvent.PEER_RECEIVE_ABORTED.ErrorCode;
// make sure the SendErrorCode above is commited to memory before we assign the state. This
// ensures that the code is read correctly in SetupWriteStartState when checking without lock
Volatile.Write(ref Unsafe.As<SendState, int>(ref state.SendState), (int)SendState.Aborted);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the Volatile.Write here since you're inside a lock. AFAIK that should emit memory barriers if necessary. @stephentoub please correct me if I'm wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests the concern is the reordering of the write to SendErrorCode and the write to SendState. The volatile is necessary to prevent that reordering. There is a fence as part of releasing the lock, but that only guarantees that neither write will move past the lock exit, not that they won't move past each other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 yep, I should have read that comment 🤣

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@rzikm
Copy link
Member Author

rzikm commented May 18, 2022

Test failure is #69387

@rzikm rzikm merged commit 43cc4c1 into dotnet:main May 18, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2022
@karelz karelz added this to the 7.0.0 milestone Jul 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants