ARM64-SVE: Add Not, InsertIntoShiftedVector #103725

amanasifkhalid · 2024-06-19T20:14:20Z

Part of #99957. InsertIntoShiftedVector requires a new test template, _SveVecAndScalarOpTest.template. This template is a clone of _SveMasklessUnaryOpTestTemplate.template, except that a second argument of type T is passed to Sve.InsertIntoShiftedVector<T> and handled elsewhere accordingly. I could've implemented InsertIntoShiftedVector without any special codegen, but the path for handling unpredicated instructions with two operands in CodeGen::genHWIntrinsic checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method. Existing instructions seem to be dependent on this order of checking, so I cannot flip the order of the if-elseif-else statement without breaking existing tests, and I wasn't sure if duplicating the RMW logic on the scalable path would be ideal if it's only for one instruction -- if we prefer to do that now, I can get rid of the special path for this intrinsic, and handle the scalable RMW case on the general path.

Not tests:

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_sbyte() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_short() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_int() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_long() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_byte() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ushort() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_uint() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ulong() : 16
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_sbyte() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_short() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_int() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_long() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_byte() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ushort() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_uint() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ulong() : 16
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

InsertIntoShiftedVector tests:

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_float() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_double() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_float() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_double() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_sbyte() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_short() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_int() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_long() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_byte() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ushort() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_uint() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

cc @dotnet/arm64-contrib

ghost · 2024-06-19T20:14:25Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2024-06-19T20:14:27Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-policy-service · 2024-06-19T20:14:57Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

kunalspathak

checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method.

Can you confirm the exact place where this is happening?

kunalspathak · 2024-06-20T12:56:26Z

...braries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.PlatformNotSupported.cs

+        ///   INSR Ztied1.D, Xop2
+        ///   INSR Ztied1.D, Dop2
+        /// </summary>
+        public static unsafe Vector<double> InsertIntoShiftedVector(Vector<double> left, double right) { throw new PlatformNotSupportedException(); }


This API, as per the docs is implementing both INSR - SIMD&FP and INSR - scalar, but it is not clear to me, how we decide which one to pick. @a74nh - any idea?

kunalspathak · 2024-06-20T13:05:06Z

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

+                if (targetReg != op1Reg)
+                {
+                    assert(targetReg != op2Reg);
+                    GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg,


Does this work with both op1Reg being gpr or SIMD/FP?

I think so, though I'm struggling to get the JIT to use a gpr for op1Reg under the stress modes -- I'm only ever getting the "easy" case, where when op1Reg and targetReg differ, op1Reg is already a vector register, so we're moving from a vector reg to a vector reg. Since the first argument in Sve.InsertIntoShiftedVector<T> is of type Vector<T>, we'd expect op1Reg to always be a vector register, right? I could add an assert here clarify this. (Though looking at emitIns_Mov, it does have a path for emitting mov instructions from a gpr to a vector register.)

Actually I meant that for op2Reg. Sorry. So ideally, where you have double or float as 2nd argument, we should have op2Reg as SIMD/floating point and otherwise should be gpr. Can you verify that please? For op1Reg it should always be scalable register and we already assert that in emitter.

No worries, I've added an assert to check this. Stress tests for unoptimized and optimized tests are still passing.

Do you mind sharing small section of disassembly for both the categories?

Not at all. Here's a snippet from the double tests:

ldr q16, [x0] ldr d17, [fp, #0x30] // [V01 loc0] insr z16.d, d17

And from the uint tests:

ldr q16, [x0] ldr w0, [fp, #0x34] // [V01 loc0] insr z16.s, w0

amanasifkhalid · 2024-06-20T14:56:15Z

Can you confirm the exact place where this is happening?

Sure; right here. Notice how in the if-else statement, we check if the instruction is scalable before we check if it has RMW semantics. insr has both flags, so we end up calling emitIns_R_R_R for it, when we should be calling emitIns_R_R (and doing any necessary moves beforehand).

amanasifkhalid · 2024-06-20T15:37:00Z

I made a small change to InsertIntoShiftedVector's test helper to avoid an extra array allocation.

amanasifkhalid · 2024-06-21T14:44:42Z

@kunalspathak @a74nh does this PR need anything else?

kunalspathak · 2024-06-21T15:06:29Z

Sure; right here.

Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing.

if (HWIntrinsicInfo::IsExplicitMaskedOperation(intrin.id))
{
    GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}
else
{
    // This generates an unpredicated version
    // Implicitly predicated should be taken care above `intrin.op2->IsEmbMaskOp()`
    GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}

kunalspathak

LGTM

amanasifkhalid · 2024-06-21T15:09:17Z

Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing.

Got it, I'll fix that in my next PR.

amanasifkhalid · 2024-06-21T15:10:41Z

/ba-g Build Analysis blocked by timeouts

amanasifkhalid added 2 commits June 19, 2024 12:00

Add not

9b676e8

Add InsertIntoShiftedVector

0d8e3c6

ghost added area-System.Runtime.Intrinsics new-api-needs-documentation labels Jun 19, 2024

amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 19, 2024

dotnet-policy-service bot assigned amanasifkhalid Jun 19, 2024

amanasifkhalid mentioned this pull request Jun 19, 2024

Arm64: Implement SVE APIs #99957

Closed

kunalspathak suggested changes Jun 20, 2024

View reviewed changes

Simplify InsertIntoShiftedVector helper

d933e7a

Add assert

f390749

build-analysis bot mentioned this pull request Jun 20, 2024

Crash in Microsoft.Extensions.Logging.Generators.Roslyn4.0.Tests.WorkItemExecution #90019

Open

kunalspathak approved these changes Jun 21, 2024

View reviewed changes

amanasifkhalid merged commit 18bc115 into dotnet:main Jun 21, 2024

amanasifkhalid deleted the sve-bitwise-not branch June 21, 2024 15:11

rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024

ARM64-SVE: Add Not, InsertIntoShiftedVector (dotnet#103725)

2d8e526

github-actions bot locked and limited conversation to collaborators Jul 23, 2024

ARM64-SVE: Add Not, InsertIntoShiftedVector #103725

ARM64-SVE: Add Not, InsertIntoShiftedVector #103725

Uh oh!

Conversation

amanasifkhalid commented Jun 19, 2024

Uh oh!

ghost commented Jun 19, 2024

Uh oh!

ghost commented Jun 19, 2024

Uh oh!

dotnet-policy-service bot commented Jun 19, 2024

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

kunalspathak Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

kunalspathak Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalspathak Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

kunalspathak Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Jun 20, 2024

Uh oh!

amanasifkhalid commented Jun 20, 2024

Uh oh!

amanasifkhalid commented Jun 21, 2024

Uh oh!

kunalspathak commented Jun 21, 2024

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

amanasifkhalid commented Jun 21, 2024

Uh oh!

amanasifkhalid commented Jun 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amanasifkhalid Jun 20, 2024 •

edited

Loading