Skip to content

Conversation

@jdpurcell
Copy link
Contributor

I noticed this duplicated bit of logic inside the range decoder. Turns out, there's a helper function called Normalize2 that does the exact same thing and was never even called. This quirk came all the way from the original LZMA C# SDK it seems. I can see why, as with .NET Framework 4.8, attempting to use the helper function results in performance degradation, so someone manually inlined it. But that's not even necessary, as [MethodImpl(MethodImplOptions.AggressiveInlining)] can be used to achieve the same performance.

But the more interesting part, hence this PR, is that newer JITs (tested with .NET 8.0) don't like the manually inlined version of the code as much; when calling Normalize2 instead, it seems to get inlined either way (even without the attribute that we put as a hint for .NET 4.8), and the performance is better.

Core i7-6700k (3.2% reduction):

Method Mean Error StdDev
Before 1.128 s 0.0018 s 0.0015 s
After 1.092 s 0.0011 s 0.0009 s

Apple M3 (11.6% reduction):

Method Mean Error StdDev
Before 888.9 ms 16.78 ms 14.01 ms
After 785.8 ms 1.98 ms 1.85 ms

That's reduction in overall time to extract the whole archive. Tested with a smaller archive in BenchmarkDotNet and I was honestly in disbelief at first on the M3, but it's real. I also manually benchmarked using the same setup/archive as my first PR, it was a 412MB Qt 7z taking 26+ seconds, and it indeed shaved an entire 3 seconds off the extraction time.

Copy link
Owner

@adamhathcock adamhathcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants