Skip to content

Conversation

@JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented Jul 10, 2020

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

I've been tinkering about with this for a while...

We've used buffered reading via a stream wrapper in our JpegDecoder to reduce the overhead of single byte reads. You end up with a nice performance boost over native reading.

this.InputStream = new DoubleBufferedStreamReader(this.Configuration.MemoryAllocator, stream);

However, the shape of the class doesn't allow for transparent reuse with other decoders which is a must. The mileage this buffering gives for each encoder varies based upon individual implementation but the net benefit is worth the investment.

With that in mind I refactored the class to turn it into an actual stream that internally wraps our input stream and is transparent to any decoder implementation. A great benefit to this is that we can fix our buffered stream to correctly handle when a stream returns less than expected on Read(buffer, offset, count) Fixes #1268 .

The following benchmarks represent our new new implementation BufferedReadStream vs the old jpeg wrapper BufferedReadStreamWrap, reading a MemoryStream preloaded with a buffer, and reading an array directly.

Things to note:

We're a lot faster on .NET Core + than the native stream reader and ~match the wrapping implementation on those frameworks.

The performance loss against the wrapper implementation in .NETFX is due to the jitter being unable to inline the appropriate methods on a sealed, inheriting class, I've experimented with several implementations and there seems to be no way around it. Since we are testing against the fastest possible stream reading scenario and we are measuring in us I'm not concerned. Our decoder benchmarks on NET FX are unaffected.

Method Runtime Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
StandardStreamRead .NET 4.7.2 126.07 us 126.498 us 6.934 us 0.99 0.08 - - - -
BufferedReadStreamRead .NET 4.7.2 118.08 us 42.234 us 2.315 us 0.92 0.03 - - - -
BufferedReadStreamWrapRead .NET 4.7.2 45.33 us 22.833 us 1.252 us 0.35 0.00 - - - -
StandardStreamReadByte .NET 4.7.2 128.17 us 94.616 us 5.186 us 1.00 0.00 - - - -
BufferedReadStreamReadByte .NET 4.7.2 143.60 us 92.871 us 5.091 us 1.12 0.08 - - - -
BufferedReadStreamWrapReadByte .NET 4.7.2 32.72 us 53.708 us 2.944 us 0.26 0.02 - - - -
ArrayReadByte .NET 4.7.2 19.40 us 12.206 us 0.669 us 0.15 0.01 - - - -
StandardStreamRead .NET Core 2.1 84.82 us 55.983 us 3.069 us 0.75 0.15 - - - -
BufferedReadStreamRead .NET Core 2.1 49.62 us 27.253 us 1.494 us 0.44 0.08 - - - -
BufferedReadStreamWrapRead .NET Core 2.1 67.78 us 87.546 us 4.799 us 0.60 0.10 - - - -
StandardStreamReadByte .NET Core 2.1 115.81 us 382.107 us 20.945 us 1.00 0.00 - - - -
BufferedReadStreamReadByte .NET Core 2.1 16.32 us 6.123 us 0.336 us 0.14 0.02 - - - -
BufferedReadStreamWrapReadByte .NET Core 2.1 16.68 us 4.616 us 0.253 us 0.15 0.03 - - - -
ArrayReadByte .NET Core 2.1 15.13 us 60.763 us 3.331 us 0.14 0.05 - - - -
StandardStreamRead .NET Core 3.1 92.44 us 88.217 us 4.835 us 0.94 0.06 - - - -
BufferedReadStreamRead .NET Core 3.1 36.41 us 5.923 us 0.325 us 0.37 0.00 - - - -
BufferedReadStreamWrapRead .NET Core 3.1 37.22 us 9.628 us 0.528 us 0.38 0.01 - - - -
StandardStreamReadByte .NET Core 3.1 98.67 us 20.947 us 1.148 us 1.00 0.00 - - - -
BufferedReadStreamReadByte .NET Core 3.1 41.36 us 123.536 us 6.771 us 0.42 0.06 - - - -
BufferedReadStreamWrapReadByte .NET Core 3.1 39.11 us 93.894 us 5.147 us 0.40 0.05 - - - -
ArrayReadByte .NET Core 3.1 18.84 us 4.684 us 0.257 us 0.19 0.00 - - - -

}

[Fact]
[Fact(Skip = "TODO: Enable when someone tells me how this mocking stuff works.")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tocsoft I really struggled to understand how this worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the test verifies that the mock IImageDecoders instances Decode<Rgba32>() method is called with exactly 2 properties that have reference equality (I believe it reference equality anyway) to this.TopLevelConfiguration and this.DataStream (which will have been returned from a mock/fake file system configured in TopLevelConfiguration when its called with MockFilePath).

For your new code which now wraps all the streams with a BufferedReadStream you will need to change the this.DataStream parameter to something like It.Is<Stream>(x=>(BufferedReadStream)x.BaseStream == this.DataStream) which allows the 'verify' to swap in other sorts of checks. (would require the base stream to be exposed from the buffered read stream)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. That fixed it!

@codecov
Copy link

codecov bot commented Jul 10, 2020

Codecov Report

Merging #1269 into master will decrease coverage by 0.01%.
The diff coverage is 82.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1269      +/-   ##
==========================================
- Coverage   82.68%   82.67%   -0.02%     
==========================================
  Files         697      697              
  Lines       30589    30656      +67     
  Branches     3460     3467       +7     
==========================================
+ Hits        25294    25344      +50     
- Misses       4593     4612      +19     
+ Partials      702      700       -2     
Flag Coverage Δ
#unittests 82.67% <82.57%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Formats/Bmp/BmpDecoderCore.cs 94.45% <ø> (ø)
src/ImageSharp/Formats/Gif/GifDecoderCore.cs 83.96% <ø> (ø)
...rmats/Jpeg/Components/Decoder/HuffmanScanBuffer.cs 94.80% <ø> (ø)
src/ImageSharp/Formats/Tga/TgaDecoderCore.cs 91.29% <ø> (ø)
src/ImageSharp/Formats/Gif/GifDecoder.cs 50.00% <46.15%> (ø)
src/ImageSharp/Formats/Png/PngDecoder.cs 48.14% <50.00%> (+0.52%) ⬆️
src/ImageSharp/Formats/Tga/TgaDecoder.cs 46.42% <50.00%> (+0.97%) ⬆️
src/ImageSharp/Formats/Jpeg/JpegDecoder.cs 48.38% <53.84%> (+0.23%) ⬆️
src/ImageSharp/Image.Decode.cs 87.30% <75.00%> (-1.04%) ⬇️
src/ImageSharp/IO/BufferedReadStream.cs 85.60% <85.60%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ee0207...e6c78c2. Read the comment docs.

@JimBobSquarePants
Copy link
Member Author

@SixLabors/core Any objections to this? I'm getting a twitchy merge finger.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below + I think some Jpeg before/after numbers would be also helpful indicators.

internal struct HuffmanScanBuffer
{
private readonly DoubleBufferedStreamReader stream;
private readonly Stream stream;
Copy link
Member

@antonfirsov antonfirsov Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably loose a some of the benchmarked performance gain here, since the instance used here is of the unsealed base class Stream, therefore the virtual methods being called on it are not inlined (unlike in benchmarks!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t actually see any slowdown at all in jpeg but I’ll run/publish tests tomorrow

Copy link
Member Author

@JimBobSquarePants JimBobSquarePants Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonfirsov Before and after benchmarks. Short run so plenty of error but nothing that stands out. Ideally I'd be able to use the concrete implementation but I always worry about exposing a new class and breaking changes.

Method TestImage Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
'Decode Jpeg - System.Drawing' Jpg/b(...)e.jpg [21] 5.123 ms 7.2206 ms 0.3958 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/b(...)e.jpg [21] 12.219 ms 15.6786 ms 0.8594 ms 2.39 0.02 - - - 15904 B
'Decode Jpeg - System.Drawing' Jpg/b(...)f.jpg [28] 13.339 ms 0.5726 ms 0.0314 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/b(...)f.jpg [28] 29.084 ms 6.2870 ms 0.3446 ms 2.18 0.02 - - - 16880 B
'Decode Jpeg - System.Drawing' Jpg/i(...)e.jpg [43] 329.528 ms 56.3769 ms 3.0902 ms 1.00 0.00 - - - 216 B
'Decode Jpeg - ImageSharp' Jpg/i(...)e.jpg [43] 268.073 ms 179.0279 ms 9.8131 ms 0.81 0.04 - - - 36022424 B
Method TestImage Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
'Decode Jpeg - System.Drawing' Jpg/b(...)e.jpg [21] 4.965 ms 0.6894 ms 0.0378 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/b(...)e.jpg [21] 11.881 ms 0.3953 ms 0.0217 ms 2.39 0.02 - - - 15808 B
'Decode Jpeg - System.Drawing' Jpg/b(...)f.jpg [28] 13.548 ms 1.5905 ms 0.0872 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/b(...)f.jpg [28] 28.147 ms 1.6038 ms 0.0879 ms 2.08 0.01 - - - 16816 B
'Decode Jpeg - System.Drawing' Jpg/i(...)e.jpg [43] 332.870 ms 22.0796 ms 1.2103 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/i(...)e.jpg [43] 286.208 ms 732.9871 ms 40.1775 ms 0.86 0.12 - - - 36022360 B

Copy link
Member

@antonfirsov antonfirsov Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ~6% change for the largest image, which is suspicious. These are no microbenchmarks, so the result of a short run is relevant, unless there is other noise, which can ofc happen with local runs.

We have DecodeJpegParseStreamOnly as a relatively quick option reassess this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before

Method Job Runtime TestImage Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
'System.Drawing FULL' Job-COEKJJ .NET 4.7.2 Jpg/b(...)f.jpg [28] 16.09 ms 0.024 ms 0.001 ms 1.00 0.00 156.2500 - - 777784 B
JpegDecoderCore.ParseStream Job-COEKJJ .NET 4.7.2 Jpg/b(...)f.jpg [28] 17.96 ms 12.218 ms 0.670 ms 1.12 0.04 - - - 12544 B
'System.Drawing FULL' Job-UZXMLV .NET Core 2.1 Jpg/b(...)f.jpg [28] 15.25 ms 0.973 ms 0.053 ms 1.00 0.00 171.8750 - - 775208 B
JpegDecoderCore.ParseStream Job-UZXMLV .NET Core 2.1 Jpg/b(...)f.jpg [28] 12.52 ms 1.801 ms 0.099 ms 0.82 0.01 - - - 12416 B
'System.Drawing FULL' Job-PIDJAF .NET Core 3.1 Jpg/b(...)f.jpg [28] 14.43 ms 3.392 ms 0.186 ms 1.00 0.00 - - - 191 B
JpegDecoderCore.ParseStream Job-PIDJAF .NET Core 3.1 Jpg/b(...)f.jpg [28] 11.55 ms 1.149 ms 0.063 ms 0.80 0.01 - - - 12408 B

After

Method Runtime TestImage Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
'System.Drawing FULL' .NET 4.7.2 Jpg/b(...)f.jpg [28] 13.52 ms 1.621 ms 0.089 ms 1.00 171.8750 - - 777789 B
JpegDecoderCore.ParseStream .NET 4.7.2 Jpg/b(...)f.jpg [28] 13.01 ms 1.466 ms 0.080 ms 0.96 - - - 12416 B
'System.Drawing FULL' .NET Core 2.1 Jpg/b(...)f.jpg [28] 13.64 ms 3.331 ms 0.183 ms 1.00 171.8750 - - 775208 B
JpegDecoderCore.ParseStream .NET Core 2.1 Jpg/b(...)f.jpg [28] 11.89 ms 0.374 ms 0.021 ms 0.87 - - - 12248 B
'System.Drawing FULL' .NET Core 3.1 Jpg/b(...)f.jpg [28] 13.20 ms 1.116 ms 0.061 ms 1.00 - - - 176 B
JpegDecoderCore.ParseStream .NET Core 3.1 Jpg/b(...)f.jpg [28] 12.09 ms 0.310 ms 0.017 ms 0.92 - - - 12246 B

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've got a good way now actually.... I updated the internal interface for the core decoders to use the buffered stream. I'll have something to show very soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonfirsov Less of a difference than I thought. That's with all the core decoder directly referencing the concrete decoder.

Method Job Runtime TestImage Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
'System.Drawing FULL' Job-RSRDJA .NET 4.7.2 Jpg/b(...)f.jpg [28] 13.28 ms 0.976 ms 0.053 ms 1.00 171.8750 - - 777761 B
JpegDecoderCore.ParseStream Job-RSRDJA .NET 4.7.2 Jpg/b(...)f.jpg [28] 13.54 ms 0.792 ms 0.043 ms 1.02 - - - 12416 B
'System.Drawing FULL' Job-QLXCST .NET Core 2.1 Jpg/b(...)f.jpg [28] 13.24 ms 2.348 ms 0.129 ms 1.00 171.8750 - - 775208 B
JpegDecoderCore.ParseStream Job-QLXCST .NET Core 2.1 Jpg/b(...)f.jpg [28] 11.72 ms 1.564 ms 0.086 ms 0.89 - - - 12352 B
'System.Drawing FULL' Job-KQKRGS .NET Core 3.1 Jpg/b(...)f.jpg [28] 13.08 ms 2.922 ms 0.160 ms 1.00 - - - 176 B
JpegDecoderCore.ParseStream Job-KQKRGS .NET Core 3.1 Jpg/b(...)f.jpg [28] 11.49 ms 0.771 ms 0.042 ms 0.88 - - - 12372 B

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On .NET Core 3.1:
11.55 (old) VS 12.09 (base Stream) VS 11.49 (concrete Stream)

I think it's quite good, or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the ratios compared to before but error could be creeping in there. It's definitely faster

I'm gonna push with the concrete implementation as it should bring us the best of everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonfirsov OK. I think this is ready to review again now.

await encoder.EncodeAsync(image, stream).ConfigureAwait(false);
}
using var encoder = new PngEncoderCore(image.GetMemoryAllocator(), image.GetConfiguration(), new PngEncoderOptions(this));
return encoder.EncodeAsync(image, stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this one needs awaiting otherwise the encoder will be disposed before the Task is awaited/activated and things will possibly exploded, but only if we end up having to perform a CopyToAsync which is unlikely to be happening the unit test

Copy link
Member Author

@JimBobSquarePants JimBobSquarePants Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch it back..... Done

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one finding, otherwise LGTM.

Comment on lines 56 to 67
while (count > 0)
{
var foo = new byte[count];
while (count > 0)
int bytesRead = stream.Read(buffer, 0, count);
if (bytesRead == 0)
{
int bytesRead = stream.Read(foo, 0, count);
if (bytesRead == 0)
{
break;
}

count -= bytesRead;
break;
}

count -= bytesRead;
}

ArrayPool<byte>.Shared.Return(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a try { ... } finally { ArrayPool<byte>.Shared.Return(buffer); } block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ms.Position = 0;
return decoder.Decode<TPixel>(ms);
}
=> Task.FromResult(decoder.Decode<TPixel>(stream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the MemoryStream copying logic an unnecessary duplicate here it's already has been done in Image.WithSeekableStreamAsync ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly.


// Simulates a stream that can only return 1 byte at a time per read instruction.
// See https://github.com/SixLabors/ImageSharp/issues/1268
private class EvilStream : MemoryStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

/// <exception cref="ArgumentNullException"><paramref name="stream"/> is null.</exception>
/// <returns>A <see cref="Task"/> representing the asynchronous operation.</returns>
public static Task<IImageInfo> IdentifyAsync(this IImageDecoderInternals decoder, BufferedReadStream stream)
=> Task.FromResult(decoder.Identify(stream));
Copy link
Member

@antonfirsov antonfirsov Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of exception in Identify (and Load), we should return Task.FromException(ex) instead of failing to create the Task.

The fix is on the way, as I'm working on a PR for #1280, so don't bother with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. Will wait to see what you come up with then. 👍

@JimBobSquarePants JimBobSquarePants merged commit ef5e21b into master Jul 22, 2020
@JimBobSquarePants JimBobSquarePants deleted the js/buffered-read branch July 22, 2020 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decoding failes on slow streams

5 participants