- 
                Notifications
    
You must be signed in to change notification settings  - Fork 5.2k
 
ARM64-SVE: Add Not, InsertIntoShiftedVector #103725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           Note regarding the   | 
    
    
      
        1 similar comment
      
    
  
    | 
           Note regarding the   | 
    
| 
           Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method.
Can you confirm the exact place where this is happening?
| /// INSR Ztied1.D, Xop2 | ||
| /// INSR Ztied1.D, Dop2 | ||
| /// </summary> | ||
| public static unsafe Vector<double> InsertIntoShiftedVector(Vector<double> left, double right) { throw new PlatformNotSupportedException(); } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API, as per the docs is implementing both INSR - SIMD&FP and INSR - scalar, but it is not clear to me, how we decide which one to pick. @a74nh - any idea?
| if (targetReg != op1Reg) | ||
| { | ||
| assert(targetReg != op2Reg); | ||
| GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with both op1Reg being gpr or SIMD/FP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, though I'm struggling to get the JIT to use a gpr for op1Reg under the stress modes -- I'm only ever getting the "easy" case, where when op1Reg and targetReg differ, op1Reg is already a vector register, so we're moving from a vector reg to a vector reg. Since the first argument in Sve.InsertIntoShiftedVector<T> is of type Vector<T>, we'd expect op1Reg to always be a vector register, right? I could add an assert here clarify this. (Though looking at emitIns_Mov, it does have a path for emitting mov instructions from a gpr to a vector register.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I meant that for op2Reg. Sorry. So ideally, where you have double or float as 2nd argument, we should have op2Reg as SIMD/floating point and otherwise should be gpr. Can you verify that please? For op1Reg it should always be scalable register and we already assert that in emitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I've added an assert to check this. Stress tests for unoptimized and optimized tests are still passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind sharing small section of disassembly for both the categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. Here's a snippet from the double tests:
            ldr     q16, [x0]
            ldr     d17, [fp, #0x30]    // [V01 loc0]
            insr    z16.d, d17
And from the uint tests:
            ldr     q16, [x0]
            ldr     w0, [fp, #0x34]     // [V01 loc0]
            insr    z16.s, w0
          
 Sure; right here. Notice how in the if-else statement, we check if the instruction is scalable before we check if it has RMW semantics.   | 
    
| 
           I made a small change to   | 
    
| 
           @kunalspathak @a74nh does this PR need anything else?  | 
    
          
 Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing. if (HWIntrinsicInfo::IsExplicitMaskedOperation(intrin.id))
{
    GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}
else
{
    // This generates an unpredicated version
    // Implicitly predicated should be taken care above `intrin.op2->IsEmbMaskOp()`
    GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
} | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
          
 Got it, I'll fix that in my next PR.  | 
    
| 
           /ba-g Build Analysis blocked by timeouts  | 
    
Part of #99957.
InsertIntoShiftedVectorrequires a new test template,_SveVecAndScalarOpTest.template. This template is a clone of_SveMasklessUnaryOpTestTemplate.template, except that a second argument of typeTis passed toSve.InsertIntoShiftedVector<T>and handled elsewhere accordingly. I could've implementedInsertIntoShiftedVectorwithout any special codegen, but the path for handling unpredicated instructions with two operands inCodeGen::genHWIntrinsicchecks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrongemitIns*method. Existing instructions seem to be dependent on this order of checking, so I cannot flip the order of the if-elseif-else statement without breaking existing tests, and I wasn't sure if duplicating the RMW logic on the scalable path would be ideal if it's only for one instruction -- if we prefer to do that now, I can get rid of the special path for this intrinsic, and handle the scalable RMW case on the general path.Not tests:
InsertIntoShiftedVector tests:
cc @dotnet/arm64-contrib