Low-hanging fruit codegen optimizations #2401

gfoidl · 2023-03-13T17:41:46Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

Scanned through the code-base (Regex based search) and did some low-hanging fruit optimizations.

optimized division by constants
for Unsafe.Add-usage tweaked the code / loops to don't emit sign extending movs (movsxd -> mov or movzx)

Sorry for touching so many files...
Cf. #2397 (comment)

Cf. https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAJI8ovLrl5hcAbho1iAZiakGAYQYBvGg11NlXcRgYBZcgAojDADYwuAcwwALAJQMAvAD4bdx04YA9AwAajBgGNDkpAAcADwAZtYQ2BieLGoQ3Bhy1Hr6DIY8pqSWRbYOzm5eDOaFGC7mHEYu5X6BIWERUFFxicmp6Zk8OQC+QA=

src/ImageSharp/Formats/Jpeg/Components/Decoder/ArithmeticScanDecoder.cs

JimBobSquarePants

Thanks for this... I've gone a fair way through but want to comment now.

I think we might need to take a step back and try to find a good balance between creating the best codegen and readability.

I see two issues.

How we're handling loops is inconsistent, sometimes we cast in diffferent ways in the loop declaration nint i = ... (nint)(uint) vs nint i = ... (uint) and sometimes we we cast inside the loop.
We're doing casting during field assignment outside of hotpaths. I don't want to introduce changes that will only have very marginal effect sacrificing readabilty in the process.

src/ImageSharp/Common/Helpers/HexConverter.cs

src/ImageSharp/Formats/Bmp/BmpEncoderCore.cs

src/ImageSharp/Formats/Jpeg/Components/Block8x8F.Generated.cs

src/ImageSharp/Formats/Png/PngScanlineProcessor.cs

src/ImageSharp/Formats/Tiff/Compression/TiffBaseCompression.cs

JimBobSquarePants · 2023-03-18T04:22:42Z

src/ImageSharp/Formats/Tiff/PhotometricInterpretation/WhiteIsZero1TiffColor{TPixel}.cs

        colorWhite.FromRgba32(Color.White);
        ref byte dataRef = ref MemoryMarshal.GetReference(data);
-        for (nint y = top; y < top + height; y++)
+        for (nint y = top; y < (nint)(uint)(top + height); y++)


In some places we haven't dont the second explicit cast from uint
e.g. the loops are written as

for (nint y = top; y < (uint)(top + height); y++)

I think we should strive to be super consistent here.

Having another thought on this, I think using nuint is the best option.

the cast to uint is enough, so no need for (nint)(uint)

the compiler errors for for (nuint i = 0; i < span.Length; ++i) whilst there's no error if nint i = 0 is used -- so it's impossible to miss one optimization

for 32-bit optimized code is emitted too

Hence I'll update the usages to nuint.

src/ImageSharp/Formats/Tiff/TiffDecoderCore.cs

gfoidl · 2023-03-20T21:37:23Z

src/ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs

    public static void AddGreenToBlueAndRed(Span<uint> pixelData)
    {
-        if (Avx2.IsSupported)
+        if (Avx2.IsSupported && pixelData.Length >= 8)


Did the length check early and made a do-while-loop.
This prevents the creation* of the vector addGreenToBlueAndRedMaskAvx2 when not needed.
The same on other loops in this file.

* actually it's a memory load from the data-segment in recent .NET versions

I've seen similar in the recent CRC PR to the runtime.

gfoidl · 2023-03-20T21:41:47Z

src/ImageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs

+                nint u = n - m;

-                for (int i = 0; i < u; i += 4)
+                for (nint i = 0; i < u; i += 4)


Because of the n - m the loop with nint is kept. With nuint one has to take care of underflow due the subtraction.

Alternatives with using nuint would be casting to a signed type or having a if + do-while looping construct. I think the nint as used here is simpler.

Yep, that makes sense. Thanks for the additional explanation!

gfoidl · 2023-03-20T21:42:30Z

src/ImageSharp/Common/Helpers/SimdUtils.Shuffle.cs

+            out uint p3,
+            out uint p2,
+            out uint p1,
+            out uint p0)


The uint here saves lots of casts elsewhere.

Yep. Very good!

gfoidl · 2023-03-20T21:44:39Z

src/ImageSharp/Formats/ImageExtensions.Save.tt

        source.Save(
            path,
-            encoder ?? source.GetConfiguration().ImageFormatsManager.FindEncoder(<#= fmt #>Format.Instance));
+            encoder ?? source.GetConfiguration().ImageFormatsManager.GetEncoder(<#= fmt #>Format.Instance));


This was a left-over from #2317?

Wow! I thought I'd redone these. Thanks!!

JimBobSquarePants · 2023-03-21T12:50:41Z

@gfoidl I'm still reviewing this. There's a LOT to read so it's gonna take me a few nights. Thanks for your patience.

gfoidl · 2023-03-21T13:03:40Z

No rush (and ultiimately it's up to you when to review 😉).

The pain point with such a PR is lots of touched files. But on the pro side the code-base gets in a consistent and good shape (regarding the micro-optimizations).

JimBobSquarePants

This is looking really good! Just a few comments and questions.

src/ImageSharp/Common/Helpers/Shuffle/IComponentShuffle.cs

JimBobSquarePants · 2023-03-21T11:51:55Z

src/ImageSharp/Common/Helpers/Shuffle/IPad3Shuffle4.cs

        ref byte dBase = ref MemoryMarshal.GetReference(dest);

-        Shuffle.InverseMMShuffle(this.Control, out int p3, out int p2, out int p1, out int p0);
+        Shuffle.InverseMMShuffle(this.Control, out uint p3, out uint p2, out uint p1, out uint p0);


This is one of those. "Why didn't I think of this!" moments. 😁

Well, then it would be boring for me 😉.

src/ImageSharp/Common/Helpers/SimdUtils.FallbackIntrinsics128.cs

JimBobSquarePants · 2023-03-21T12:01:15Z

src/ImageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs

+                nint u = n - m;

-                for (int i = 0; i < u; i += 4)
+                for (nint i = 0; i < u; i += 4)


Yep, that makes sense. Thanks for the additional explanation!

JimBobSquarePants · 2023-03-21T12:02:16Z

src/ImageSharp/Common/Helpers/SimdUtils.Shuffle.cs

+            out uint p3,
+            out uint p2,
+            out uint p1,
+            out uint p0)


Yep. Very good!

JimBobSquarePants · 2023-03-21T12:34:27Z

src/ImageSharp/PixelFormats/PixelImplementations/Abgr32.cs

        // We can assign the Bgr24 value directly to last three bytes of this instance.
        ref byte thisRef = ref Unsafe.As<Abgr32, byte>(ref this);
-        ref byte thisRefFromB = ref Unsafe.AddByteOffset(ref thisRef, new IntPtr(1));
+        ref byte thisRefFromB = ref Unsafe.AddByteOffset(ref thisRef, 1);


Wouldn't this be better as (nuint)1u?

AddByteOffset has overloads for nuint and IntPtr, thus C#-compiler picks the nuint overload and treats the literal 1 as nuint-constant. No need for the cast here.

src/ImageSharp/Processing/Processors/Binarization/BinaryThresholdProcessor{TPixel}.cs

src/ImageSharp/Processing/Processors/Convolution/ConvolutionProcessor{TPixel}.cs

src/ImageSharp/Processing/Processors/Convolution/EdgeDetectorCompassProcessor{TPixel}.cs

JimBobSquarePants · 2023-03-21T12:49:03Z

src/ImageSharp/Processing/Processors/Convolution/MedianRowOperation{TPixel}.cs

        for (int i = 0; i < this.kernelSize; i++)
        {
-            int currentYIndex = Unsafe.Add(ref sampleRowBase, i);
+            int currentYIndex = Unsafe.Add(ref sampleRowBase, (uint)i);


Cast in the outer loop for consistency?

Just below i is needed for slicing as int.
It would be more consistent, but also two casts instead of one in the loop body.

I have no strong opinion what's better here. Consistency or less casts?

gfoidl

Feedback is addressed / commented.

Once you're happy with the changes I need to push another commit to fix cases like

ImageSharp/src/ImageSharp/Formats/Jpeg/Components/Encoder/ComponentProcessor.cs

Line 136 in e234b00

nuint count = (uint)(source.Length / Vector<float>.Count);

which I de-optimized in a previous commit of this PR (got the wrong casts here in order to emit fastest code).

JimBobSquarePants · 2023-03-24T00:58:38Z

Feedback is addressed / commented.

Once you're happy with the changes I need to push another commit to fix cases like

ImageSharp/src/ImageSharp/Formats/Jpeg/Components/Encoder/ComponentProcessor.cs

Line 136 in e234b00

nuint count = (uint)(source.Length / Vector<float>.Count);

which I de-optimized in a previous commit of this PR (got the wrong casts here in order to emit fastest code).

Super happy with everything so far! Fire ahead 😄

gfoidl · 2023-03-24T10:47:29Z

Fire ahead 😄

Here we go: 2bbf1cb (only changed that division by vector length (which I unfortunately optimized, then later de-optimized, and with that commit optimized again -- now it's enough of that ping-poing 😉))

JimBobSquarePants

Amazing stuff. Thank you!

gfoidl added 7 commits March 13, 2023 13:17

Revised Unsafe.Add to avoid the sign-extending move

1920e28

Removed some bound checks for arr[0] indexing to get a reference

1faf5a5

Optimized division by constants

9b7e41f

Fixed warnings from CI

957ee98

Merge branch 'main' into codegen-optimizations

5d111db

Fixed bugs

bc61781

gfoidl commented Mar 13, 2023

View reviewed changes

src/ImageSharp/Formats/Jpeg/Components/Decoder/ArithmeticScanDecoder.cs Outdated Show resolved Hide resolved

Fixed Bug Pt. II

deaabf1

JimBobSquarePants reviewed Mar 18, 2023

View reviewed changes

PR feedback + use nuint instead of nint

f746e68

gfoidl commented Mar 20, 2023

View reviewed changes

gfoidl added 2 commits March 20, 2023 22:56

Removed unnecessary comments

a95ab17

Switched from for-loop to if + do-while in Vp8LHistogram

d6aeba1

JimBobSquarePants reviewed Mar 22, 2023

View reviewed changes

gfoidl mentioned this pull request Mar 23, 2023

Port GrayscalConverter to Arm #2409

Merged

PR feedback

e234b00

gfoidl commented Mar 23, 2023

View reviewed changes

Merge branch 'main' into codegen-optimizations

84bad73

Fixed division by vector length

2bbf1cb

JimBobSquarePants reviewed Mar 24, 2023

View reviewed changes

JimBobSquarePants merged commit 0a1f05b into SixLabors:main Mar 24, 2023

gfoidl deleted the codegen-optimizations branch March 24, 2023 12:24

gfoidl mentioned this pull request Mar 24, 2023

Fixed wrong division hack #2413

Merged

4 tasks

dependabot bot mentioned this pull request Sep 29, 2025

Bump SixLabors.ImageSharp from 2.1.11 to 3.1.11 EvotecIT/OfficeIMO#1283

Closed

dependabot bot mentioned this pull request Oct 6, 2025

Bump SixLabors.ImageSharp from 2.1.11 to 3.1.11 EvotecIT/OfficeIMO#1296

Closed

dependabot bot mentioned this pull request Oct 13, 2025

Bump SixLabors.ImageSharp from 2.1.11 to 3.1.11 EvotecIT/OfficeIMO#1314

Closed

dependabot bot mentioned this pull request Nov 3, 2025

Bump SixLabors.ImageSharp from 2.1.11 to 3.1.12 EvotecIT/OfficeIMO#1339

Closed

This was referenced Nov 17, 2025

Bump SixLabors.ImageSharp from 2.1.11 to 3.1.12 EvotecIT/OfficeIMO#1362

Open

Bump SixLabors.ImageSharp from 2.1.11 to 3.1.12 yildirim-mehmet/onlineOfiice#8

Open

Uh oh!

Low-hanging fruit codegen optimizations #2401

Low-hanging fruit codegen optimizations #2401

Uh oh!

Conversation

gfoidl commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Description

Uh oh!

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants commented Mar 21, 2023

Uh oh!

gfoidl commented Mar 21, 2023

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfoidl left a comment

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants commented Mar 24, 2023

Uh oh!

gfoidl commented Mar 24, 2023

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

gfoidl commented Mar 13, 2023 •

edited

Loading