-
Notifications
You must be signed in to change notification settings - Fork 15k
[SystemZ] Add support for half (fp16) #109164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-clang Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
@llvm/pr-subscribers-backend-systemz Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions c,h,cpp -- clang/test/CodeGen/SystemZ/Float16.c clang/test/CodeGen/SystemZ/fp16.c compiler-rt/lib/builtins/extendhfdf2.c compiler-rt/test/builtins/Unit/extendhfdf2_test.c clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/SystemZ.h clang/lib/CodeGen/Targets/SystemZ.cpp clang/test/CodeGen/SystemZ/strictfp_builtins.c clang/test/CodeGen/SystemZ/systemz-abi.c clang/test/CodeGen/SystemZ/systemz-inline-asm.c compiler-rt/lib/builtins/clear_cache.c llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.h llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.h View the diff from clang-format here.diff --git a/compiler-rt/test/builtins/Unit/extendhfdf2_test.c b/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
index 422e272c1..bf33291d8 100644
--- a/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
+++ b/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
@@ -7,81 +7,63 @@
double __extendhfdf2(TYPE_FP16 a);
-int test__extendhfdf2(TYPE_FP16 a, uint64_t expected)
-{
- double x = __extendhfdf2(a);
- int ret = compareResultD(x, expected);
+int test__extendhfdf2(TYPE_FP16 a, uint64_t expected) {
+ double x = __extendhfdf2(a);
+ int ret = compareResultD(x, expected);
- if (ret){
- printf("error in test__extendhfdf2(%#.4x) = %f, "
- "expected %f\n", toRep16(a), x, fromRep64(expected));
- }
- return ret;
+ if (ret) {
+ printf("error in test__extendhfdf2(%#.4x) = %f, "
+ "expected %f\n",
+ toRep16(a), x, fromRep64(expected));
+ }
+ return ret;
}
char assumption_1[sizeof(TYPE_FP16) * CHAR_BIT == 16] = {0};
-int main()
-{
- // qNaN
- if (test__extendhfdf2(makeQNaN16(),
- UINT64_C(0x7ff8000000000000)))
- return 1;
- // NaN
- if (test__extendhfdf2(fromRep16(0x7f80),
- UINT64_C(0x7ffe000000000000)))
- return 1;
- // inf
- if (test__extendhfdf2(makeInf16(),
- UINT64_C(0x7ff0000000000000)))
- return 1;
- // -inf
- if (test__extendhfdf2(makeNegativeInf16(),
- UINT64_C(0xfff0000000000000)))
- return 1;
- // zero
- if (test__extendhfdf2(fromRep16(0x0),
- UINT64_C(0x0)))
- return 1;
- // -zero
- if (test__extendhfdf2(fromRep16(0x8000),
- UINT64_C(0x8000000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x4248),
- UINT64_C(0x4009200000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0xc248),
- UINT64_C(0xc009200000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x6e62),
- UINT64_C(0x40b9880000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x3c00),
- UINT64_C(0x3ff0000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x0400),
- UINT64_C(0x3f10000000000000)))
- return 1;
- // denormal
- if (test__extendhfdf2(fromRep16(0x0010),
- UINT64_C(0x3eb0000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x0001),
- UINT64_C(0x3e70000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x8001),
- UINT64_C(0xbe70000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x0001),
- UINT64_C(0x3e70000000000000)))
- return 1;
- // max (precise)
- if (test__extendhfdf2(fromRep16(0x7bff),
- UINT64_C(0x40effc0000000000)))
- return 1;
- // max (rounded)
- if (test__extendhfdf2(fromRep16(0x7bff),
- UINT64_C(0x40effc0000000000)))
- return 1;
- return 0;
+int main() {
+ // qNaN
+ if (test__extendhfdf2(makeQNaN16(), UINT64_C(0x7ff8000000000000)))
+ return 1;
+ // NaN
+ if (test__extendhfdf2(fromRep16(0x7f80), UINT64_C(0x7ffe000000000000)))
+ return 1;
+ // inf
+ if (test__extendhfdf2(makeInf16(), UINT64_C(0x7ff0000000000000)))
+ return 1;
+ // -inf
+ if (test__extendhfdf2(makeNegativeInf16(), UINT64_C(0xfff0000000000000)))
+ return 1;
+ // zero
+ if (test__extendhfdf2(fromRep16(0x0), UINT64_C(0x0)))
+ return 1;
+ // -zero
+ if (test__extendhfdf2(fromRep16(0x8000), UINT64_C(0x8000000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x4248), UINT64_C(0x4009200000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0xc248), UINT64_C(0xc009200000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x6e62), UINT64_C(0x40b9880000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x3c00), UINT64_C(0x3ff0000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x0400), UINT64_C(0x3f10000000000000)))
+ return 1;
+ // denormal
+ if (test__extendhfdf2(fromRep16(0x0010), UINT64_C(0x3eb0000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x0001), UINT64_C(0x3e70000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x8001), UINT64_C(0xbe70000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x0001), UINT64_C(0x3e70000000000000)))
+ return 1;
+ // max (precise)
+ if (test__extendhfdf2(fromRep16(0x7bff), UINT64_C(0x40effc0000000000)))
+ return 1;
+ // max (rounded)
+ if (test__extendhfdf2(fromRep16(0x7bff), UINT64_C(0x40effc0000000000)))
+ return 1;
+ return 0;
}
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index fdbfc196e..8b0225347 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -229,8 +229,8 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
// The fp<=>i32/i64 conversions are all Legal except for f16 and for
// unsigned on z10 (only z196 and above have native support for
// unsigned conversions).
- for (auto Op : {ISD::FP_TO_SINT, ISD::STRICT_FP_TO_SINT,
- ISD::SINT_TO_FP, ISD::STRICT_SINT_TO_FP})
+ for (auto Op : {ISD::FP_TO_SINT, ISD::STRICT_FP_TO_SINT, ISD::SINT_TO_FP,
+ ISD::STRICT_SINT_TO_FP})
setOperationAction(Op, VT, Custom);
for (auto Op : {ISD::FP_TO_UINT, ISD::STRICT_FP_TO_UINT})
setOperationAction(Op, VT, Custom);
|
Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations. |
Thanks for pointing that out - patch updated. |
I think we should define and implement a proper ABI for the half type as well. |
Patch updated after some progress... With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers. At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h... Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability. Any other comments also welcome... |
My understanding is that in GCC's From your first two sentences it sounds like @uweigand mentioned figuring out an ABI for A quick check seems to show that GCC 13 does not support Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975. |
From what I can see in the libgcc sources, I never see
Yes, we're working on that. What we're planning to do is to have
Yes, we'd have to add those. I don't think we want
Thanks for pointing this out! |
I think this is accurate, libgcc just appears to (reasonably) not provide any f16-related symbols on platforms where GCC doesn't support For that reason we just always provide the symbols in rust's compiler-builtins (though we let LLVM figure out that
That is great news, especially considering how problematic the target-feature-dependent ABI on x86-32 has been. |
Patch reworked:
(twoaddr-kill.mir test updated as the hard-coded register class enum value for GRH32BitRegClass has changed.) Still some more points to go over, but it seems to be working fairly well at this point.
|
Patch improved further:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review, but some general comments inline.
setOperationAction(ISD::FCOS, VT, Expand); | ||
setOperationAction(ISD::FSINCOS, VT, Expand); | ||
setOperationAction(ISD::FREM, VT, Expand); | ||
setOperationAction(ISD::FPOW, VT, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if there are no f16 libcalls it works to have them be promoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just crosslinking that there is an effort to add f16 libcalls #95250 but I have no clue what the plan is as far as lowering to them.
Rebased:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/10952 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/94/builds/6266 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/8262 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/10239 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/6404 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2721 Here is the relevant piece of the build log for the reference
|
LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.
LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually done in LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.
LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Rollup merge of #144987 - tgross35:llvm21-f16-f128, r=nikic Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.
Patch in progress...
Fixes #50374