-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Implement ArrayBufferWriter to provide an in-box ArrayPool-backed IBufferWriter<byte> #35094
Conversation
| public ArrayBufferWriter(int initialCapacity) { } | ||
| public int BytesAvailable { get { throw null; } } | ||
| public int BytesWritten { get { throw null; } } | ||
| public int Capacity { get { throw null; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BytesAvailable + BytesWritten == Capacity? Do we need BytesAvailable? What does one do with it? Why do two have the Bytes prefix and the third doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BytesAvailable + BytesWritten == Capacity?
Yes.
Do we need BytesAvailable? What does one do with it?
BytesAvailable is useful when the user wants to avoid resizing when they call GetSpan() since they can know up front if the data they want to write will fit or not in the leftover space.
|
API changes based on feedback:
|
| public void Clear() { } | ||
| public void Dispose() { } | ||
| public System.Memory<T> GetMemory(int sizeHint = 0) { throw null; } | ||
| public System.Span<T> GetSpan(int sizeHint = 0) { throw null; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have another API review discussion about these names.
Thank you for making these implicitly implemented rather than explicitly. The crux of my feedback was that, without doing so or exposing similar functionality, the type was useless without being cast to the interface. The problem now, though, is that these names are a bit confusing, especially with OutputAsMemory and OutputAsSpan exposed on the type. We could still use explicit interface implementations if we wanted to and then expose this functionality as public methods with different names, in order to make them less confusing when using the public surface area of the type directly rather than via the interface. Or, we could leave them as is, and either accept the small confusion or potentially come up with different names for OutputAsMemory/Span.
| public int Capacity { get { throw null; } } | ||
| public int CurrentIndex { get { throw null; } } | ||
| public System.ReadOnlyMemory<T> OutputAsMemory { get { throw null; } } | ||
| public System.ReadOnlySpan<T> OutputAsSpan { get { throw null; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose both OutputAsMemory and OutputAsSpan? You can get a Span from a Memory, so the latter is equivalent to OutputAsMemory.Span. Given the typical use for this type, does the difference in perf between the two when you actually want a span make any measurable difference?
Same question goes for exposing GetSpan; as long as GetMemory is exposed, GetSpan could be implemented explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my primary scenario for this was writing the data to a stream (asynchronously), we can defer adding OutputAsSpan until there is a customer request for it (Mem.Span is not noticeably slower (i.e. 15-25 ns) even for small writes like hello world since it's called infrequently).
Same question goes for exposing GetSpan; as long as GetMemory is exposed, GetSpan could be implemented explicitly.
However, I don't know if the same applies to GetSpan/GetMemory. If we expect ArrayBufferWriter to be used similar to how IBufferWriter is used today (i.e. we have Advance/GetMemory as regular interface implementations), we should be consistent and implement GetSpan that way as well. Writing some data to a span (synchronously) is faster than going through memory (especially if its a small amount). That was the primary reason to have GetSpan on the interface to begin with. I haven't measured myself but presumably the perf was significant enough to justify adding it, given how we currently use the API. It looks like we do this often with high granularity:
https://github.com/aspnet/AspNetCore/blob/68cc5e7846240bacc146854dfc6a175e30ff0f60/src/Servers/Kestrel/Core/src/Internal/BufferWriter.cs#L20
https://github.com/aspnet/AspNetCore/blob/b1f778bfb887f5c8282f5de42302c3f5736da790/src/Servers/Kestrel/Core/src/Internal/Http2/Http2FrameWriter.cs#L404
https://github.com/aspnet/AspNetCore/blob/3cd5054eb5d8851b25881cd05cccb931aaa022e6/src/SignalR/common/Shared/TextMessageFormatter.cs#L17
We only call GetMemory for async calls or interop where the work being done is expensive already:
https://github.com/aspnet/AspNetCore/blob/6d442c1e3d7875132c662a9e8c537e30cea585b8/src/Servers/Kestrel/Core/src/Adapter/Internal/AdaptedPipeline.cs#L119
https://github.com/aspnet/AspNetCore/blob/68cc5e7846240bacc146854dfc6a175e30ff0f60/src/Servers/Kestrel/Transport.Libuv/src/Internal/LibuvConnection.cs#L148
https://github.com/aspnet/AspNetCore/blob/ad11f890ef0646ecf3cefde63c9021865aa9490e/src/Servers/Kestrel/Transport.Sockets/src/Internal/SocketConnection.cs#L165
https://github.com/aspnet/AspNetCore/blob/3cd5054eb5d8851b25881cd05cccb931aaa022e6/src/SignalR/clients/csharp/Http.Connections.Client/src/Internal/WebSocketsTransport.cs#L222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A benchmark like this would regress ~5%:
https://github.com/aspnet/AspNetCore/blob/dbf82dc8c4cdb410f036811143812db9f8bcc096/src/SignalR/perf/Microbenchmarks/HandshakeProtocolBenchmark.cs#L81
https://github.com/aspnet/AspNetCore/blob/dbf82dc8c4cdb410f036811143812db9f8bcc096/src/SignalR/common/SignalR.Common/src/Protocol/HandshakeProtocol.cs#L59-L70
BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.253 (1809/October2018Update/Redstone5)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010174
[Host] : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
Core : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
Job=Core EnvironmentVariables=COMPlus_TieredCompilation=0 Runtime=Core
Toolchain=.NET Core 3.0
| Method | Mean | Error | StdDev | Ratio | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|---|---|---|---|---|---|---|---|---|
| WriteArrayBufferWriterMemSpan | 257.4 ns | 2.561 ns | 2.396 ns | 1.05 | 0.0076 | - | - | 32 B |
| WriteArrayBufferWriterSpan | 245.8 ns | 2.107 ns | 1.645 ns | 1.00 | 0.0076 | - | - | 32 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢
|
|
||
| public ArrayBufferWriter() | ||
| { | ||
| _rentedBuffer = ArrayPool<T>.Shared.Rent(MinimumBufferSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rent on first use vs .ctor? e.g. could be _rentedBuffer = Array<T>.Empty();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit? With this change, we end up going through the unnecessary resize logic when _rentedBuffer == 0 on first GetSpan/GetMemory (because sizeHint > availableSpace)
Instead of just renting the min sized array (what we have today), you end up copying an empty buffer, clearing it, and returning it to the pool.
|
I didn't realize that this was a single array at first. Is there any concern that this type basically shouldn't be used if buffering over 85K (LOH size?). What about a strategy to make a linked list once buffers get to that size instead of doubling? OR maybe that's a different type? |
ArrayPool buffers though; which go to 2MB? Shouldn't hold on to it longer than needed though... (its backing array) |
| <ItemGroup Condition="'$(TargetGroup)' == 'uap' OR '$(TargetGroup)' == 'netcoreapp'"> | ||
| <ItemGroup> | ||
| <ProjectReference Include="..\..\System.Runtime\ref\System.Runtime.csproj" /> | ||
| <ProjectReference Include="..\..\System.Memory\ref\System.Memory.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ericstj, can you please review the props/csproj changes related to System.Buffers? I removed the netstandard configuration since we only build this library in-box now, targeting .NET Core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out I can't remove the netstandard configuration so conditionally including the new APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we need to keep building System.Memory for netstandard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you don't need to. You can use a simple-name reference to get the version we restore from packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you aren't shipping the netstandard config nor changing it removing it is correct. You should ensure that you harvest and bin place the old assembly (so that folks don't build against a new version that doesn't ship).
| <AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
| <IsPartialFacadeAssembly>true</IsPartialFacadeAssembly> | ||
| <ExcludeResourcesImport>true</ExcludeResourcesImport> | ||
| <Configurations>netcoreapp-Unix-Debug;netcoreapp-Unix-Release;netcoreapp-Windows_NT-Debug;netcoreapp-Windows_NT-Release;netcoreappaot-Windows_NT-Debug;netcoreappaot-Windows_NT-Release;uap-Windows_NT-Debug;uap-Windows_NT-Release;uapaot-Windows_NT-Debug;uapaot-Windows_NT-Release</Configurations> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep building the aot configurations?
System.Memory for netstandard anymore.
| } | ||
|
|
||
| // Returns the rented buffer back to the pool | ||
| public void Dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting to Dispose the ArrayBufferWriter means the _rentedBuffer becomes a memory leak, right?
Should the implementation take care of that via a finalizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a leak, it just means the memory won't be returned to ArrayPool<T>.Shared, and will instead be collected by the GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no problem then. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I followed this same pattern in my Sequence<T> class, and wished I could go back and remove the IDisposable interface. I called a new public method Reset() and the instance is reusable after that, allowing for recycling of everything allocated.
| <PropertyGroup> | ||
| <!-- Must match version supported by frameworks which support 4.0.* inbox. | ||
| Can be removed when API is added and this assembly is versioned to 4.1.* --> | ||
| <AssemblyVersion>4.0.2.0</AssemblyVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should bump minor version of the netcoreapp assembly since it has more API than last version (as well as netstandard version)
| @@ -1,5 +1,64 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <root> | |||
| <!-- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why adding this garbage back? We shouldn't churn: either leave it out or keep it in but don't toggle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't realized we previously removed it (#16284). I just added a new String resource in VS (used in an exception) and it auto-updated to add the boiler plate. I removed the unused string that was there previously.
Other resx file have it too:
https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/Resources/Strings.resx
https://github.com/dotnet/corefx/blob/master/src/System.Text.Json/src/Resources/Strings.resx
https://github.com/dotnet/corefx/blob/master/src/System.IO.Pipelines/src/Resources/Strings.resx
@danmosemsft, should we be removing this comment? It looks like VS adds it back so not sure its worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is we shouldn't be fighting VS on it and shouldn't have removed them in the first place, since any modifications via the GUI end up adding it back. Agreeing with @ericstj, I'm fine putting it back as part of this change as long as we leave it that way. If VS ever allows us to configure the behavior, then we could set that configuration and do a one-time sweep to remove them from everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a bug to remove this 17 years ago. Until they do, there is no point fighting it.
| <ItemGroup Condition="'$(TargetGroup)' == 'uap' OR '$(TargetGroup)' == 'netcoreapp'"> | ||
| <ItemGroup> | ||
| <ProjectReference Include="..\..\System.Runtime\ref\System.Runtime.csproj" /> | ||
| <ProjectReference Include="..\..\System.Memory\ref\System.Memory.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you aren't shipping the netstandard config nor changing it removing it is correct. You should ensure that you harvest and bin place the old assembly (so that folks don't build against a new version that doesn't ship).
| netcoreappaot-WebAssembly; | ||
| netcoreapp-Unix; | ||
| uap-Windows_NT; | ||
| uapaot-Windows_NT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? UAPAOT has a different core lib.
|
@ahsonkhan what is status & plan for this PR? I don't see update in last 3.5 weeks ... |
I am going to finish up an investigation on an alternate approach and update this in the next 2-3 days based on my findings. Thanks for the ping :) |
@ahsonkhan did you find an alternate approach or you planning on update this PR? |
|
any update? |
|
|
||
| Span<T> previousBuffer = oldBuffer.AsSpan(0, _index); | ||
| previousBuffer.CopyTo(_rentedBuffer); | ||
| previousBuffer.Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Sequence<T> I check whether T is a reference type and only bother clearing the array when it is (to ensure the GC can collect those objects).
Not clearing value type arrays seems reasonable IMO. If there were "secrets" that shouldn't be leaked, they shouldn't use this type anyway since it doesn't "pin" the array and the GC could have moved objects around leaving old copies in memory.
| Debug.Assert(_rentedBuffer != null); | ||
|
|
||
| if (sizeHint < 0) | ||
| throw new ArgumentException(nameof(sizeHint)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't ArgumentOutOfRangeException be a more precise and appropriate exception type to throw?
|
|
||
| T[] oldBuffer = _rentedBuffer; | ||
|
|
||
| _rentedBuffer = ArrayPool<T>.Shared.Rent(newSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it useful both to accept an optional ArrayPool<T> instance from the user, and to allow the user to specify a minimum array length to rent. I would recommend this here as well.
For me (where I form a sequence of arrays) it prevents an IBufferWriter<T> consumer (e.g. a serialier) from allocating thousands of tiny arrays when it calls GetSpan(sizeof(some-small-struct)) a lot.
For your class, it would prevent a great many array rental and copy operations every time the one array grows by just a few bytes. Either that, or perhaps have a growing factor similar to List<T> so you always double the length or something like that.
|
@ahsonkhan if you don't have time to finish it now, let's close it, until you do have the time ;) Thanks! |
|
Sounds good. Closing for now, pending potential re-design. |
|
Although what is the new ETA of this change? |
Fixes https://github.com/dotnet/corefx/issues/34894
TODO:
Add tests for T != byte (char, reference type like string, struct, class).Figure out which assembly to expose this from. We decided oncc @safernSystem.Buffers.dll, but I am not sure if we can add APIs to that assembly or not givenArrayPool, the only API implemented in that assembly, got moved down to corelib (Remove System.Buffers package #32096). Currently, exposing this fromSystem.Memory.dll.corefx/src/System.Buffers/src/System.Buffers.csproj
Lines 9 to 11 in 53a33cf
Changes based on dotnet/apireviews#88:
BytesCommittedtoTotalBytesWrittenfor clarityCapacityandBytesAvailableIBufferWriter<byte>interface explicitly to hide the APIs on ArrayBufferWriter itself.ReadOnlyClear()toReset()andCopyToAndClear()toCopyToAndReset()based on feedback from @davidfowlcc @davidfowl, @KrzysztofCwalina, @stephentoub, @steveharter, @JeremyKuhne, @pakrym, @layomia