Skip to content

Conversation

@SwapnilGaikwad
Copy link
Contributor

@SwapnilGaikwad SwapnilGaikwad commented Aug 30, 2024

Fixes #106868
Fixes #106871
Fixes #106872

@a74nh @kunalspathak @dotnet/arm64-contrib @arch-arm64-sve @TIHan @amanasifkhalid

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 30, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 30, 2024
@SwapnilGaikwad
Copy link
Contributor Author

Added a few conditional select tests for reduction intrinsics.

Passes all stress tests.
===================Running default===================
------------------- {} -------------------
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector64_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector64_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector64_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector64_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector128_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector128_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector128_Int32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector128_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector128_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcross_Vector128_UInt32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector64_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector64_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector64_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector64_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector128_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector128_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector128_Int32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector128_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector128_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.AddAcrossWidening_Vector128_UInt32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector64_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector64_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector64_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector64_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_Int32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_Single() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxAcross_Vector128_UInt32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MaxNumberAcross_Vector128_Single() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector64_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector64_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector64_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector64_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_Byte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_Int16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_Int32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_SByte() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_Single() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_UInt16() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinAcross_Vector128_UInt32() : 7
Passed test: _AdvSimd_Arm64_ro::JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.Program.MinNumberAcross_Vector128_Single() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_float() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_double() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_long_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_long_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_long_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_ulong_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_ulong_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_ulong_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddAcross_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddSequentialAcross_float() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddSequentialAcross_double() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AndAcross_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_float() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_double() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxAcross_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxNumberAcross_float() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MaxNumberAcross_double() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_float() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_double() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinAcross_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinNumberAcross_float() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_MinNumberAcross_double() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_OrAcross_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_XorAcross_ulong() : 19
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Happy we've got the extra template testing added too.

Copy link
Contributor

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too. Thanks!

@amanasifkhalid amanasifkhalid merged commit 46c9a4f into dotnet:main Sep 3, 2024
// when the nestedOp is a reduce operation.

if (nestedOp1->IsMaskAllBitsSet() &&
if (nestedOp1->IsMaskAllBitsSet() && !HWIntrinsicInfo::IsReduceOperation(nestedOp2Id) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reduction operations are notably a case where the pattern we want is somewhat inversed.

Rather than CndSel(mask, Op(value), merge) we want to instead want to check for Op(CndSel(mask, value, zero)) or similar.

We should have a tracking issue to ensure that happens for .NET 10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being looked at in #101973

@SwapnilGaikwad SwapnilGaikwad deleted the github-sve-bug-fixes branch September 3, 2024 15:04
radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Sep 6, 2024
@kunalspathak
Copy link
Contributor

@amanasifkhalid - any reason why this was not backported to release/9.0?

@amanasifkhalid
Copy link
Contributor

@kunalspathak sorry I missed this one -- I'll kick off the backport now

@amanasifkhalid
Copy link
Contributor

/backport to release/9.0

@github-actions
Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10820916154

@github-actions
Copy link
Contributor

@amanasifkhalid backporting to release/9.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Avoid combining conditional select for reduction instrinsics
Using index info to reconstruct a base tree...
M	src/coreclr/jit/hwintrinsic.h
M	src/coreclr/jit/hwintrinsiclistarm64sve.h
M	src/coreclr/jit/lowerarmarch.cpp
M	src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs
Falling back to patching base and 3-way merge...
Auto-merging src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs
Auto-merging src/coreclr/jit/lowerarmarch.cpp
CONFLICT (content): Merge conflict in src/coreclr/jit/lowerarmarch.cpp
Auto-merging src/coreclr/jit/hwintrinsiclistarm64sve.h
Auto-merging src/coreclr/jit/hwintrinsic.h
CONFLICT (content): Merge conflict in src/coreclr/jit/hwintrinsic.h
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Avoid combining conditional select for reduction instrinsics
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

@amanasifkhalid an error occurred while backporting to release/9.0, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

amanasifkhalid pushed a commit to amanasifkhalid/runtime that referenced this pull request Sep 11, 2024
amanasifkhalid pushed a commit to amanasifkhalid/runtime that referenced this pull request Sep 11, 2024
amanasifkhalid pushed a commit to amanasifkhalid/runtime that referenced this pull request Sep 11, 2024
jeffschwMSFT added a commit that referenced this pull request Sep 12, 2024
Co-authored-by: SwapnilGaikwad <[email protected]>
Co-authored-by: Jeff Schwartz <[email protected]>
jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024
sirntar pushed a commit to sirntar/runtime that referenced this pull request Sep 30, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

5 participants