Skip to content

Conversation

@JimBobSquarePants
Copy link
Member

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Adds some AVX variants to some of the Block8x8F methods used in the DCT classes. The code was already fast so a 2X speedup isn't possible.

Managed to trim a little more off the jpeg benchmarks compared to #1374

I also fixed some naming "Inplace" => "InPlace".

I think that's probably the most I can do in this area now. Colorspace translation and PixelOperations are the bottlenecks.

Benchmarks.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]          : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  AVX             : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  SSE             : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Runtime=.NET Core 3.1
Method Job EnvironmentVariables Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
MultiplyInPlaceScalar AVX Empty 23.05 ns 0.456 ns 0.426 ns 1.00 0.00 - - - -
MultiplyInPlaceScalar SSE COMPlus_EnableAVX=0 25.86 ns 0.095 ns 0.089 ns 1.12 0.02 - - - -
Method Job EnvironmentVariables Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
MultiplyInPlaceBlock AVX Empty 36.32 ns 0.224 ns 0.199 ns 1.00 - - - -
MultiplyInPlaceBlock SSE COMPlus_EnableAVX=0 38.15 ns 0.349 ns 0.326 ns 1.05 - - - -
Method Job EnvironmentVariables Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
AddToAllInplace AVX Empty 22.61 ns 0.480 ns 0.513 ns 1.00 0.00 - - - -
AddToAllInplace SSE COMPlus_EnableAVX=0 24.22 ns 0.334 ns 0.296 ns 1.07 0.04 - - - -
Method TestImage Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
'Decode Jpeg - System.Drawing' Jpg/b(...)e.jpg [21] 5.279 ms 1.1265 ms 0.0617 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/b(...)e.jpg [21] 10.072 ms 0.7810 ms 0.0428 ms 1.91 0.02 - - - 15918 B
'Decode Jpeg - System.Drawing' Jpg/b(...)f.jpg [28] 14.557 ms 21.0053 ms 1.1514 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/b(...)f.jpg [28] 25.332 ms 6.8164 ms 0.3736 ms 1.75 0.13 - - - 16896 B
'Decode Jpeg - System.Drawing' Jpg/i(...)e.jpg [43] 342.239 ms 186.6256 ms 10.2296 ms 1.00 0.00 - - - 176 B
'Decode Jpeg - ImageSharp' Jpg/i(...)e.jpg [43] 252.773 ms 112.3915 ms 6.1606 ms 0.74 0.02 - - - 36022512 B

@JimBobSquarePants JimBobSquarePants added this to the 1.1.0 milestone Oct 16, 2020
@JimBobSquarePants JimBobSquarePants requested a review from a team October 16, 2020 17:32
@codecov
Copy link

codecov bot commented Oct 16, 2020

Codecov Report

Merging #1390 into master will increase coverage by 0.03%.
The diff coverage is 98.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1390      +/-   ##
==========================================
+ Coverage   82.84%   82.87%   +0.03%     
==========================================
  Files         690      690              
  Lines       30848    30903      +55     
  Branches     3542     3545       +3     
==========================================
+ Hits        25555    25610      +55     
  Misses       4572     4572              
  Partials      721      721              
Flag Coverage Δ
#unittests 82.87% <98.19%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...arp/Formats/Jpeg/Components/Block8x8F.Generated.cs 100.00% <ø> (ø)
...rc/ImageSharp/Formats/Jpeg/Components/Block8x8F.cs 92.42% <98.11%> (+1.17%) ⬆️
.../Jpeg/Components/Decoder/JpegBlockPostProcessor.cs 100.00% <100.00%> (ø)
...rp/Formats/Jpeg/Components/FastFloatingPointDCT.cs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd4f565...bedf6d5. Read the comment docs.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, good to see the last drops being squeezed out here!

@JimBobSquarePants JimBobSquarePants merged commit 60eba39 into master Oct 16, 2020
@JimBobSquarePants JimBobSquarePants deleted the js/block8x8f-optimizations branch October 16, 2020 23:23
JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021
Optimize Block8x8F low hanging fruit and fix naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants