Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions src/ImageSharp/Common/Extensions/StreamExtensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// Licensed under the Apache License, Version 2.0.

using System;
using System.Buffers;
using System.IO;

using SixLabors.ImageSharp.Memory;
Expand Down Expand Up @@ -82,5 +83,25 @@ public static void Write(this Stream stream, IManagedByteBuffer buffer)
{
stream.Write(buffer.Array, 0, buffer.Length());
}

#if NET472 || NETSTANDARD1_3 || NETSTANDARD2_0
// This is a port of the CoreFX implementation and is MIT Licensed: https://github.com/dotnet/coreclr/blob/c4dca1072d15bdda64c754ad1ea474b1580fa554/src/System.Private.CoreLib/shared/System/IO/Stream.cs#L770
public static void Write(this Stream stream, ReadOnlySpan<byte> buffer)
{
// This uses ArrayPool<byte>.Shared, rather than taking a MemoryAllocator,
// in order to match the signature of the framework method that exists in
// .NET Core.
byte[] sharedBuffer = ArrayPool<byte>.Shared.Rent(buffer.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually prefer using our own memory management primitives like MemoryAllocator.AllocateManagedByteBuffer() instead of ArrayPool.Shared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is matching a signature and implementation available in .NET Core. ArrayPool<byte>.Shared should generally be allocation-less for something like this (not necessarily for other T) since most of the framework is fairly dependent on it.

I'll run benchmarks on net472 as well though, in order to see what the overhead here is, if any, since it isn't applicable to netcoreapp2.1 (where the above benchmark was run).

Copy link
Member

@antonfirsov antonfirsov Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our "allocator" is typically also an allocation free pool, but we definitely have an overhead because of nested virtual calls, and other infrastructure stuff.
Everything depends on the size of the array. It's not uncommon to have very large (> 1 MB) buffers in image processing, which works better with our allocator/pool in my experience.

However, for those buffers we should avoid invoking this extension method anyways. @JimBobSquarePants we can probably make an exception here, but we need to make sure the purpose is documented, or at least we remember it 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely and I don't recall if benchmark.net isolates each iteration or not (I know it does for separate tests), so it could be that the benchmark is "lying" for full framework and it is hiding the allocation cost (so it might not be representative of real world scenarios).

It would be possible to take a MemoryAllocator as an optional parameter or to not have SosHeaderYCbCr take advantage of this compiler optimization (the single copy on class instantiation isn't terrible, but losing out on permanent pinning is a bit unfortunate) or to only not use the optimization on full framework if there are concerns about this regressing full framework.

IIRC, the ArrayPool<byte>.Shared is optimized for up to 2MB, since we use it for things like the JSON and XML readers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment explaining why we use that pool (signature matching) and move on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Also rebased onto the current HEAD.

try
{
buffer.CopyTo(sharedBuffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like we are loosing all the benefits of the copy-free initialization at this line...

Copy link
Member

@antonfirsov antonfirsov Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the benchmarks show different results ... I wonder why? I guess this method is not a hot path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of netcoreapp, the method is actually virtual and various stream types can override it to avoid the copy. I'll double check the numbers on full framework to see if it hurts anything (and if so, will look at what can be done).

stream.Write(sharedBuffer, 0, buffer.Length);
}
finally
{
ArrayPool<byte>.Shared.Return(sharedBuffer);
}
}
#endif
}
}
185 changes: 103 additions & 82 deletions src/ImageSharp/Formats/Jpeg/JpegEncoderCore.cs
Original file line number Diff line number Diff line change
Expand Up @@ -27,85 +27,6 @@ internal sealed unsafe class JpegEncoderCore
/// </summary>
private const int QuantizationTableCount = 2;

/// <summary>
/// Counts the number of bits needed to hold an integer.
/// </summary>
private static readonly uint[] BitCountLut =
{
0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8,
};

/// <summary>
/// The SOS (Start Of Scan) marker "\xff\xda" followed by 12 bytes:
/// - the marker length "\x00\x0c",
/// - the number of components "\x03",
/// - component 1 uses DC table 0 and AC table 0 "\x01\x00",
/// - component 2 uses DC table 1 and AC table 1 "\x02\x11",
/// - component 3 uses DC table 1 and AC table 1 "\x03\x11",
/// - the bytes "\x00\x3f\x00". Section B.2.3 of the spec says that for
/// sequential DCTs, those bytes (8-bit Ss, 8-bit Se, 4-bit Ah, 4-bit Al)
/// should be 0x00, 0x3f, 0x00&lt;&lt;4 | 0x00.
/// </summary>
private static readonly byte[] SosHeaderYCbCr =
{
JpegConstants.Markers.XFF, JpegConstants.Markers.SOS,

// Marker
0x00, 0x0c,

// Length (high byte, low byte), must be 6 + 2 * (number of components in scan)
0x03, // Number of components in a scan, 3
0x01, // Component Id Y
0x00, // DC/AC Huffman table
0x02, // Component Id Cb
0x11, // DC/AC Huffman table
0x03, // Component Id Cr
0x11, // DC/AC Huffman table
0x00, // Ss - Start of spectral selection.
0x3f, // Se - End of spectral selection.
0x00

// Ah + Ah (Successive approximation bit position high + low)
};

/// <summary>
/// The unscaled quantization tables in zig-zag order. Each
/// encoder copies and scales the tables according to its quality parameter.
/// The values are derived from section K.1 after converting from natural to
/// zig-zag order.
/// </summary>
private static readonly byte[,] UnscaledQuant =
{
{
// Luminance.
16, 11, 12, 14, 12, 10, 16, 14, 13, 14, 18, 17, 16, 19, 24,
40, 26, 24, 22, 22, 24, 49, 35, 37, 29, 40, 58, 51, 61, 60,
57, 51, 56, 55, 64, 72, 92, 78, 64, 68, 87, 69, 55, 56, 80,
109, 81, 87, 95, 98, 103, 104, 103, 62, 77, 113, 121, 112,
100, 120, 92, 101, 103, 99,
},
{
// Chrominance.
17, 18, 18, 24, 21, 24, 47, 26, 26, 47, 99, 66, 56, 66,
99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99, 99, 99,
}
};

/// <summary>
/// A scratch buffer to reduce allocations.
/// </summary>
Expand Down Expand Up @@ -167,6 +88,103 @@ public JpegEncoderCore(IJpegEncoderOptions options)
this.subsample = options.Subsample;
}

/// <summary>
/// Gets the counts the number of bits needed to hold an integer.
/// </summary>
// The C# compiler emits this as a compile-time constant embedded in the PE file.
// This is effectively compiled down to: return new ReadOnlySpan<byte>(&data, length)
// More details can be found: https://github.com/dotnet/roslyn/pull/24621
private static ReadOnlySpan<byte> BitCountLut => new byte[]
{
0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8,
};

/// <summary>
/// Gets the SOS (Start Of Scan) marker "\xff\xda" followed by 12 bytes:
/// - the marker length "\x00\x0c",
/// - the number of components "\x03",
/// - component 1 uses DC table 0 and AC table 0 "\x01\x00",
/// - component 2 uses DC table 1 and AC table 1 "\x02\x11",
/// - component 3 uses DC table 1 and AC table 1 "\x03\x11",
/// - the bytes "\x00\x3f\x00". Section B.2.3 of the spec says that for
/// sequential DCTs, those bytes (8-bit Ss, 8-bit Se, 4-bit Ah, 4-bit Al)
/// should be 0x00, 0x3f, 0x00&lt;&lt;4 | 0x00.
/// </summary>
// The C# compiler emits this as a compile-time constant embedded in the PE file.
// This is effectively compiled down to: return new ReadOnlySpan<byte>(&data, length)
// More details can be found: https://github.com/dotnet/roslyn/pull/24621
private static ReadOnlySpan<byte> SosHeaderYCbCr => new byte[]
{
JpegConstants.Markers.XFF, JpegConstants.Markers.SOS,

// Marker
0x00, 0x0c,

// Length (high byte, low byte), must be 6 + 2 * (number of components in scan)
0x03, // Number of components in a scan, 3
0x01, // Component Id Y
0x00, // DC/AC Huffman table
0x02, // Component Id Cb
0x11, // DC/AC Huffman table
0x03, // Component Id Cr
0x11, // DC/AC Huffman table
0x00, // Ss - Start of spectral selection.
0x3f, // Se - End of spectral selection.
0x00

// Ah + Ah (Successive approximation bit position high + low)
};

/// <summary>
/// Gets the unscaled quantization tables in zig-zag order. Each
/// encoder copies and scales the tables according to its quality parameter.
/// The values are derived from section K.1 after converting from natural to
/// zig-zag order.
/// </summary>
// The C# compiler emits this as a compile-time constant embedded in the PE file.
// This is effectively compiled down to: return new ReadOnlySpan<byte>(&data, length)
// More details can be found: https://github.com/dotnet/roslyn/pull/24621
private static ReadOnlySpan<byte> UnscaledQuant_Luminance => new byte[]
{
// Luminance.
16, 11, 12, 14, 12, 10, 16, 14, 13, 14, 18, 17, 16, 19, 24,
40, 26, 24, 22, 22, 24, 49, 35, 37, 29, 40, 58, 51, 61, 60,
57, 51, 56, 55, 64, 72, 92, 78, 64, 68, 87, 69, 55, 56, 80,
109, 81, 87, 95, 98, 103, 104, 103, 62, 77, 113, 121, 112,
100, 120, 92, 101, 103, 99,
};

/// <summary>
/// Gets the unscaled quantization tables in zig-zag order. Each
/// encoder copies and scales the tables according to its quality parameter.
/// The values are derived from section K.1 after converting from natural to
/// zig-zag order.
/// </summary>
// The C# compiler emits this as a compile-time constant embedded in the PE file.
// This is effectively compiled down to: return new ReadOnlySpan<byte>(&data, length)
// More details can be found: https://github.com/dotnet/roslyn/pull/24621
private static ReadOnlySpan<byte> UnscaledQuant_Chrominance => new byte[]
{
// Chrominance.
17, 18, 18, 24, 21, 24, 47, 26, 26, 47, 99, 66, 56, 66,
99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99, 99, 99,
};

/// <summary>
/// Encode writes the image to the jpeg baseline format with the given options.
/// </summary>
Expand Down Expand Up @@ -259,9 +277,12 @@ private static void WriteDataToDqt(byte[] dqt, ref int offset, QuantIndex i, ref
/// <param name="quant">The quantization table.</param>
private static void InitQuantizationTable(int i, int scale, ref Block8x8F quant)
{
DebugGuard.MustBeBetweenOrEqualTo(i, 0, 1, nameof(i));
var unscaledQuant = (i == 0) ? UnscaledQuant_Luminance : UnscaledQuant_Chrominance;

for (int j = 0; j < Block8x8F.Size; j++)
{
int x = UnscaledQuant[i, j];
int x = unscaledQuant[j];
x = ((x * scale) + 50) / 100;
if (x < 1)
{
Expand Down Expand Up @@ -357,7 +378,7 @@ private void EmitHuffRLE(HuffIndex index, int runLength, int value)
}
else
{
bt = 8 + BitCountLut[a >> 8];
bt = 8 + (uint)BitCountLut[a >> 8];
}

this.EmitHuff(index, (int)((uint)(runLength << 4) | bt));
Expand Down Expand Up @@ -871,7 +892,7 @@ private void WriteStartOfScan<TPixel>(Image<TPixel> image)
{
// TODO: Need a JpegScanEncoder<TPixel> class or struct that encapsulates the scan-encoding implementation. (Similar to JpegScanDecoder.)
// TODO: We should allow grayscale writing.
this.outputStream.Write(SosHeaderYCbCr, 0, SosHeaderYCbCr.Length);
this.outputStream.Write(SosHeaderYCbCr);

switch (this.subsample)
{
Expand Down