Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20241,6 +20241,9 @@ static SDValue trySimplifySrlAddToRshrnb(SDValue Srl, SelectionDAG &DAG,
return SDValue();
unsigned ShiftValue = SrlOp1->getZExtValue();

if (ShiftValue > ResVT.getScalarSizeInBits())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unlikely to happen, but is it worth being paranoid here and also test for the zero shift case? The instruction RSHRNB only permits shifts between 1 and the dest element bit width.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the shift amount cannot be clamped to the maximum rather than not doing the transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-arm zero shifts get nuked out of existence really early on by other passes so I'm skeptical this isn't just slowing down the transform. That being said I'll add a test with a 0 shift just to assert it's being correctly removed. From what I can see as well, it isn't possible for SVE shifts to emit if the shift amount is 0. So even if this code bailed on a zero shift, it wouldn't be valid code anyway.

Copy link
Contributor Author

@MDevereau MDevereau Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulwalker-arm I was trying to think of a case but since the add is checked for single use and the top half of the elements get narrowed I think this it's safe to clamp it to the maximum value.

return SDValue();

SDValue Add = Srl->getOperand(0);
if (Add->getOpcode() != ISD::ADD || !Add->hasOneUse())
return SDValue();
Expand Down
46 changes: 46 additions & 0 deletions llvm/test/CodeGen/AArch64/sve2-intrinsics-combine-rshrnb.ll
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,52 @@ define void @wide_add_shift_add_rshrnb_h(ptr %dest, i64 %index, <vscale x 8 x i3
ret void
}

define void @wide_add_shift_add_rshrnb_d(ptr %dest, i64 %index, <vscale x 4 x i64> %arg1){
; CHECK-LABEL: wide_add_shift_add_rshrnb_d:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: rshrnb z1.s, z1.d, #32
; CHECK-NEXT: rshrnb z0.s, z0.d, #32
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0, x1, lsl #2]
; CHECK-NEXT: add z0.s, z1.s, z0.s
; CHECK-NEXT: st1w { z0.s }, p0, [x0, x1, lsl #2]
; CHECK-NEXT: ret
%1 = add <vscale x 4 x i64> %arg1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2147483648, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%2 = lshr <vscale x 4 x i64> %1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 32, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%3 = getelementptr inbounds i32, ptr %dest, i64 %index
%load = load <vscale x 4 x i32>, ptr %3, align 4
%4 = trunc <vscale x 4 x i64> %2 to <vscale x 4 x i32>
%5 = add <vscale x 4 x i32> %load, %4
store <vscale x 4 x i32> %5, ptr %3, align 4
ret void
}

; Do not emit rshrnb if the shift amount is larger than the dest eltsize in bits
define void @neg_wide_add_shift_add_rshrnb_d(ptr %dest, i64 %index, <vscale x 4 x i64> %arg1){
; CHECK-LABEL: neg_wide_add_shift_add_rshrnb_d:
; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.d, #0x800000000000
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: add z0.d, z0.d, z2.d
; CHECK-NEXT: add z1.d, z1.d, z2.d
; CHECK-NEXT: lsr z1.d, z1.d, #48
; CHECK-NEXT: lsr z0.d, z0.d, #48
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0, x1, lsl #2]
; CHECK-NEXT: add z0.s, z1.s, z0.s
; CHECK-NEXT: st1w { z0.s }, p0, [x0, x1, lsl #2]
; CHECK-NEXT: ret
%1 = add <vscale x 4 x i64> %arg1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 140737488355328, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%2 = lshr <vscale x 4 x i64> %1, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 48, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%3 = getelementptr inbounds i32, ptr %dest, i64 %index
%load = load <vscale x 4 x i32>, ptr %3, align 4
%4 = trunc <vscale x 4 x i64> %2 to <vscale x 4 x i32>
%5 = add <vscale x 4 x i32> %load, %4
store <vscale x 4 x i32> %5, ptr %3, align 4
ret void
}

define void @neg_trunc_lsr_add_op1_not_splat(ptr %ptr, ptr %dst, i64 %index, <vscale x 8 x i16> %add_op1){
; CHECK-LABEL: neg_trunc_lsr_add_op1_not_splat:
; CHECK: // %bb.0:
Expand Down