Skip to content

Conversation

@tannergooding
Copy link
Member

This adds support for containment on TYP_SIMD12 loads/stores and improves the codegen to require less temporary registers and use better instructions when available.

Improved Load:

- lea      rax, bword ptr [rcx+30H]
- vmovss   xmm0, dword ptr [rax+08H]
- vmovsd   xmm1, qword ptr [rax]
- vshufps  xmm1, xmm0, 68
+ vmovsd   xmm0, qword ptr [rcx+30H]
+ vinsertps xmm0, dword ptr [rcx+38H], 2

Improved Store:

- vmovsd   qword ptr [rdx], xmm1
- vpshufd  xmm0, xmm1, 2
- vmovss   dword ptr [rdx+08H], xmm0
+ vmovsd   qword ptr [rdx], xmm0
+ vextractps dword ptr [rdx+08H], xmm0, 2

Combined this saves 9 bytes of codegen and improves the PerScore by 1.5

Total diffs are all relatively similar. Emitting vmovsd + vinsertps or vmovsd + vextractps and removing now unnecessary lea in favor of containing them.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 31, 2022
@ghost ghost assigned tannergooding Dec 31, 2022
@ghost
Copy link

ghost commented Dec 31, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This adds support for containment on TYP_SIMD12 loads/stores and improves the codegen to require less temporary registers and use better instructions when available.

Improved Load:

- lea      rax, bword ptr [rcx+30H]
- vmovss   xmm0, dword ptr [rax+08H]
- vmovsd   xmm1, qword ptr [rax]
- vshufps  xmm1, xmm0, 68
+ vmovsd   xmm0, qword ptr [rcx+30H]
+ vinsertps xmm0, dword ptr [rcx+38H], 2

Improved Store:

- vmovsd   qword ptr [rdx], xmm1
- vpshufd  xmm0, xmm1, 2
- vmovss   dword ptr [rdx+08H], xmm0
+ vmovsd   qword ptr [rdx], xmm0
+ vextractps dword ptr [rdx+08H], xmm0, 2

Combined this saves 9 bytes of codegen and improves the PerScore by 1.5

Total diffs are all relatively similar. Emitting vmovsd + vinsertps or vmovsd + vextractps and removing now unnecessary lea in favor of containing them.

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, this should be ready for review.

Gives some small size savings for x64 (~2k bytes in fullopts and ~0.5k bytes in minopts) and a small TP win on x64

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@tannergooding
Copy link
Member Author

Fixed the jitstress failure. Results in ~3.3k savings on x86/x64 and a -0.01% TP improvement

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. There seems to be a failure for ARM though, but it looks like just one test case at the moment.

@tannergooding
Copy link
Member Author

There seems to be a failure for ARM though, but it looks like just one test case at the moment.

It's an unrelated/existing GC timeout. I've retriggered it and it should pass on rerun.

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@runfoapp runfoapp bot mentioned this pull request Jan 6, 2023
@tannergooding tannergooding merged commit 87312f7 into dotnet:main Jan 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 5, 2023
@tannergooding tannergooding deleted the better-simdcodegen branch July 1, 2025 14:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants