Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx) #737

valassi · 2023-07-25T13:08:09Z

This MR includes a few minor fixes for new platforms (#734), in particular icx2023.2 and clang16

increase tolerance for rambo momenta in comparison tests RAMBO is unstable in single precision #735 (this also affects gcc)
workaround for FPE in testxxx in icx 2023.2 Floating point exceptions in testxxx using icx2023.2 (FPEs from sqrt in ixxx/oxxx/vxxx) #736
add wrappers for clang16 compilation

…me undetected errors! (NB this is itscrd90, so different from the baseline itscrd80) The summary says status=0... STARTED AT Mon Jul 24 07:58:00 PM CEST 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Mon Jul 24 11:44:19 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Tue Jul 25 12:08:13 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Tue Jul 25 12:18:54 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Tue Jul 25 12:22:12 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Tue Jul 25 12:25:28 AM CEST 2023 [Status=0] But actually some tests have failed... tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (CPU neppV=1): 'unknown' ievt=-1 tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (CPU neppV=4): 'unknown' ievt=-1 tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (CPU neppV=8): 'unknown' ievt=-1 tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'unknown' ievt=-1 tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'unknown' ievt=-1

STARTED AT Tue Jul 25 12:28:50 AM CEST 2023 ENDED AT Tue Jul 25 04:39:58 AM CEST 2023 Status=0 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt

…E-2 to 4E-2 in single precision for momenta (madgraph5#735) This fixes the test failure (also in gcc), but there is an FPE madgraph5#736 in icx

…bug FPE madgraph5#736 The FPE seems to be in testxxx.cc? Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 3 tests from 3 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Program received signal SIGFPE, Arithmetic exception. 0x0000000000413193 in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:132 132 mass0[ievt] = sqrt( p0 * p0 - p1 * p1 - p2 * p2 - p3 * p3 ); Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.x86_64 libgcc-11.3.1-4.3.el9.alma.x86_64 libstdc++-11.3.1-4.3.el9.alma.x86_64 (gdb) where

…, this seems to avoid the FPE madgraph5#736

Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736" This reverts commit 901ddab.

…tile" workaround for madgraph5#736

…olerance fix for madgraph5#735

…ormatting

…ormatting)

…adgraph5#735 and madgraph5#736 - but a few (undetected) FPEs still take place STARTED AT Tue Jul 25 03:05:48 PM CEST 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Tue Jul 25 04:01:00 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Tue Jul 25 04:17:52 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Tue Jul 25 04:28:37 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Tue Jul 25 04:31:55 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Tue Jul 25 04:35:11 PM CEST 2023 [Status=0] Example: runExe /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/ee_mumu.mad/SubProcesses/P1_epem_mupmum/build.avx2_f_inl0_hrd0/runTest.exe -Floating Point Exception (CPU neppV=8): 'unknown' ievt=-1 +Floating Point Exception (CPU neppV=8): 'ixxxxx' ievt=0 May be reproduced with ./tput/teeThroughputX.sh -eemumu -fltonly

… (exit 1 instead of exit 0), see madgraph5#736

…ed (madgraph5#736)

…o not go undetected (exit 1 instead of exit 0), see madgraph5#736

for f in $(gitls */SubProcesses/testxxx.cc); do \cp ee_mumu.mad/SubProcesses/testxxx.cc $f; done

…bug FPE madgraph5#736 [avalassi@itscrd90 icx2023/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx> make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 ... Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [New Thread 0x7fffedea5000 (LWP 375028)] [New Thread 0x7fffed6a4000 (LWP 375029)] [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Thread 1 "runTest.exe" received signal SIGFPE, Arithmetic exception. 0x000000000041373f in mg5amcCpu::fpternary(int __vector(8) const&, float __vector(8) const&, float const&) (mask=..., a=..., b=<optimized out>) at ../../src/mgOnGpuVectors.h:490 490 for( int i = 0; i < neppV; i++ ) out[i] = ( mask[i] ? a[i] : b ); Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.x86_64 libgcc-11.3.1-4.3.el9.alma.x86_64 libstdc++-11.3.1-4.3.el9.alma.x86_64 nvidia-driver-cuda-libs-530.30.02-1.el9.x86_64 (gdb) where 0 0x000000000041373f in mg5amcCpu::fpternary(int __vector(8) const&, float __vector(8) const&, float const&) (mask=..., a=..., b=<optimized out>) at ../../src/mgOnGpuVectors.h:490 1 mg5amcCpu::fpmax(float __vector(8) const&, float const&) (a=..., b=<optimized out>) at ../../src/mgOnGpuVectors.h:650 2 mg5amcCpu::ixxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x103cac0, fmass=<optimized out>, nhel=nhel@entry=1, nsf=nsf@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff9a80, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:279

… still crashes

…icx with another volatile for square roots (add also a volatile fpsqrt) Now ixxxxx succeeds and the FPE moves to vxxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=8): 'vxxxxx' ievt=0

…icx with another volatile for square roots Now vxxxxx succeeds and the FPE moves to oxxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=8): 'oxxxxx' ievt=0

…icx with another volatile for square roots (as done in ixxxxx) Now oxxxxx succeeds and the FPE moves to another part of ixxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=8): 'ixxxxx' ievt=16 (gdb) where 0x00000000004135a4 in mg5amcCpu::ixxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x103cf00, fmass=500, nhel=nhel@entry=1, nsf=nsf@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff9a80, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:208 208 const fptype_sv pp = fpmin( pvec0, fpsqrt( pvec1 * pvec1 + pvec2 * pvec2 + pvec3 * pvec3 ) );

…xxx for icx with another volatile for square roots Now the FPE moves to another part of ixxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 (gdb) where 0 0x00000000004137ff in mg5amcCpu::fpsqrt(float __vector(8) const&) (v=...) at ../../src/mgOnGpuVectors.h:253 1 mg5amcCpu::ixxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x103cf00, fmass=<optimized out>, nhel=nhel@entry=1, nsf=nsf@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff9a80, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:264

…icx with another volatile for square roots (add also a volatile fpsqrt) Now ixxxxx succeeds and the FPE moves to another part of vxxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=8): 'vxxxxx' ievt=16 (gdb) where 0 0x000000000041114c in mg5amcCpu::vxxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x103cf00, vmass=500, vmass@entry=<error reading variable: That operation is not available on integers of more than 8 bytes.>, nhel=nhel@entry=1, nsv=nsv@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff96c0, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:464 1 0x00000000004431e4 in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:372

…icx with another volatile for square roots Now vxxxxx succeeds and the FPE moves to another part of oxxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=8): 'oxxxxx' ievt=16 (gdb) where 0 0x00000000004126d5 in mg5amcCpu::oxxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x103cf00, fmass=500, fmass@entry=<error reading variable: That operation is not available on integers of more than 8 bytes.>, nhel=nhel@entry=1, nsf=nsf@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff9c80, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:622 1 0x00000000004434c6 in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:390

…xxx for icx with another volatile for square roots Now the FPE moves to another part of oxxxxx make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=avx2 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=8): 'oxxxxx' ievt=16 (gdb) where 0 0x000000000041291f in mg5amcCpu::fpsqrt(float __vector(8) const&) (v=...) at ../../src/mgOnGpuVectors.h:253 1 mg5amcCpu::oxxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x103cf00, fmass=<error reading variable: That operation is not available on integers of more than 8 bytes.>, nhel=nhel@entry=1, nsf=nsf@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff9c80, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:679

Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736" This reverts commit 9a5a5bc.

… etc

…test.mk has also changed now

./tput/teeThroughputX.sh -ggtt -flt -makej -makeclean

…bug FPE madgraph5#736 make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=sse4 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=4): 'unknown' ievt=-1 (gdb) where 0 0x00000000004173f8 in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:133 1 0x00000000004c6ffc in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()

… do nt understand why it gives an FPE, honestly) Now another FPE in sse4 moves again in ixxx... make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=sse4 Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc [==========] Running 6 tests from 6 test suites. [----------] Global test environment set-up. [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX [ RUN ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0 (gdb) where 0 0x0000000000411a60 in mg5amcCpu::fpsqrt(float __vector(4) const volatile&) (v=...) at ../../src/mgOnGpuVectors.h:244 1 mg5amcCpu::ixxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > ( momenta=momenta@entry=0x101b8c0, fmass=<optimized out>, nhel=nhel@entry=1, nsf=nsf@entry=-1, wavefunctions=wavefunctions@entry=0x7fffffff9de0, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:288 2 0x000000000043f30f in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:340

…qrt (I do not understand why it gives an FPE, honestly) This now fixes the FPTYPE=f AVX=sse4 runTest.exe on icx...

./tput/teeThroughputX.sh -ggtt -flt -makej -makeclean

Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736" This reverts commit e3af119.

… etc - and include formatting fixes

valassi · 2023-07-25T23:35:12Z

This has become another very complex FPE fixing campaign... and not only in ixx/oxx/vxx. The icx optimizer, based now on clang17, is doing even more strange stuff. In some cases I have data which is supposed to be exactly 0 and I am unable to take a sqrt of that, so I do it ONLY if it is GREATER than 0.

Rerunning all tests tonight. Hopefully it looks better.

…nt Exception" errors have disappeared STARTED AT Wed Jul 26 01:32:01 AM CEST 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Wed Jul 26 05:31:18 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Wed Jul 26 05:58:22 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Wed Jul 26 06:12:34 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Wed Jul 26 06:16:54 AM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Wed Jul 26 06:21:13 AM CEST 2023 [Status=0] Example diff: -Floating Point Exception (CPU neppV=4): 'unknown' ievt=-1 +[ PASSED ] 6 tests. There is some degradation of performance, but only for simple 2->2 processes. For more complex processes, performance is essentially the same. Somewhat surprisingly, double (double FP) results do not seem to be affected? Only float (single FP) results seem to show some difference in performance and disassembly symbols?

STARTED AT Wed Jul 26 06:25:39 AM CEST 2023 ENDED AT Wed Jul 26 10:43:48 AM CEST 2023 Status=0 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt There is maybe a tiny degradation of performance, but only for simpler physics processes

…rging

…est.mk to keep backward-compatibility to epoch1/epoch2 of gtest directory names madgraph5#125 and madgraph5#738

…h1/epoch2 fixes to the other 13 processes for f in $(gitls */SubProcesses/cudacpp.mk); do \cp gg_tt.mad/SubProcesses/cudacpp.mk $f; done for f in $(gitls */test/cudacpp_test.mk); do \cp gg_tt.mad/test/cudacpp_test.mk $f; done

valassi · 2023-07-26T10:10:44Z

I have rerun all tests overnight - now all FPEs are fixed in icpx.

I have also made minor changes in the gtest handling of compiler-specific build directories (#125 and #738) for allwoing backward compatibility to epoch1/epoch2, which were previously failing the CI

This is now ready, I will self-merge. I will document the full contents a posteriori

valassi · 2023-07-26T10:33:22Z

I have merged this MR #737 with another set of comprehensive FPE fixes. Here's some documentation

This started off as a very minor port of the code to the latest icx2023.2 from cvmfs (see Test builds with newer compilers (gcc13, clang16, icpx2023.2...) #734). I added a few wrapper scripts also for clang16.
I increased the tolerance for RAMBO momenta comparisons from 3E-2 to 4E-2 in single precision. This closes RAMBO is unstable in single precision #735.
What took me again a long time was, again, new FPEs. In MR Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault #723 (related to Four floating point exceptions in CPP launch of pp_ttW (IEEE_DIVIDE_BY_ZERO FPE in vxxxxx function in SIMD mode) #701 Ensure clang and icc builds are also ok with the new ixx/oxx/vxx #724 Disabling auto vectorization in ixx/oxx causes a loss of performance #727) I had implemented many fixes for floating point exceptions in SIMD divisions by zero, in some cases using the volatile keyword, which had been triggered in part by the clang (and icx 2023.1?) optimizers. In moving from icx2023.1 to icx2023.2 (which implies internally a move from clang16 to clang17), the optimizer has now triggered new floating point exceptions around sqrt operations, possibly around sqrt of negative values. As in my previous patch, most changes were in the ixx/oxx/vxx hardcoded functions, and again I fixed these with a generous use of volatile. However there were also FPEs in the test infrastructure itself, and I also added sqrt functions with volatile arguments to the fptype/vector headers. In some cases, I had a hard time understanding why there was a problem at all. Anyway, these are all fixed. This closes Floating point exceptions in testxxx using icx2023.2 (FPEs from sqrt in ixxx/oxxx/vxxx) #736.
While doing these tests, at some point I got an FPE deep down in libm, called by the googletest framework. This was when using ggogletest built with gcc, against madgraph4gpu in icx. For lack of better ideas, I tried to do a clean compiler-dependent rebuild of googletest with separate directories (as I had suggested already in GoogleTest library should be rebuilt when changing compiler #125). This fixed the issue, and I kept this in production. I even implemented a backward-compatible mechanism for epoch1/epoch2 in the CI, just in case. This fixes FPE in icpx tests when gtest was built with a different compiler #738.

It should also be noted that another whole set of FPEs is still pending in #733. These are invalid (sqrt?), underflow and overflow FPEs. Initial investigations in WIP MR #706, in any case, suggest that this is due to some bug in coupling propagation in non-SM processes, rather than to the ixx/oxx/vxx functions. To be followed up... that is high priority as it is a blocker for ATLAS pp->ttW.

Volia that's all for the documentation of this MR.

cc @roiser @oliviermattelaer @hageboeck @Jooorgen @zeniheisser

…ng icx madgraph5#737 into upstream/master)

valassi added 13 commits July 24, 2023 19:49

[icx] add tools scripts for clang 16.0.3

9f254a1

[icx] in gg_tt.mad MadgraphTest.h, increase momentum tolerance from 3…

91cd253

…E-2 to 4E-2 in single precision for momenta (madgraph5#735) This fixes the test failure (also in gcc), but there is an FPE madgraph5#736 in icx

[icx] in gg_tt.mad testxxx.cc, protect a mass squared with "volatile"…

11eaf06

…, this seems to avoid the FPE madgraph5#736

[icx] in gg_tt.mad cudacpp.mk, switch off -g again

4b5ee51

Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736" This reverts commit 901ddab.

[icx] in CODEGEN, backport testxxx.cc change from ggtt.mad with "vola…

2812fd9

…tile" workaround for madgraph5#736

[icx] in CODEGEN, backport MadgraphTest.h change from ggtt.mad with t…

f30a905

…olerance fix for madgraph5#735

[icx] in CODEGEN, cosmetic fixes in MadgraphTest.h to respect clang f…

7685880

…ormatting

[icx] regenerate ggtt.mad - all ok (with cosmetic changes for clang f…

623610f

…ormatting)

[icx] regenerate the pther 6 processes mad

b5b8d61

[icx] regenerate 7 processes sa

011c197

This was linked to issues Jul 25, 2023

RAMBO is unstable in single precision #735

Closed

Floating point exceptions in testxxx using icx2023.2 (FPEs from sqrt in ixxx/oxxx/vxxx) #736

Closed

valassi added 15 commits July 25, 2023 17:15

[icx] in eemumu.mad testxxx.cc, ensure that FPEs do not go undetected…

264b3cc

… (exit 1 instead of exit 0), see madgraph5#736

[icx] ensure that runTest failures in tput scripts do not go undetect…

3696213

…ed (madgraph5#736)

[icx] in CODEGEN (backport eemumu.mad) testxxx.cc, ensure that FPEs d…

4c223a3

…o not go undetected (exit 1 instead of exit 0), see madgraph5#736

[icx] regenerate eemumu mad, all ok

3491fc5

[icx] propagate "exit 1" fix manually to the other 13 processes

ba7f6d5

for f in $(gitls */SubProcesses/testxxx.cc); do \cp ee_mumu.mad/SubProcesses/testxxx.cc $f; done

[icx] in ggtt.mad HelAmps_sm.h, try to avoid the ixxxxx FPE, but this…

e620e94

… still crashes

valassi added 13 commits July 26, 2023 00:00

[icx] in gg_tt.mad cudacpp.mk, switch off -g again

b94f8b0

Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736" This reverts commit 9a5a5bc.

[icx] backport to CODEGEN the latest ggtt.mad changes for madgraph5#736…

fbdc488

… etc

[icx] regenerate ggtt.mad, all ok - note that gg_tt.mad/test/cudacpp_…

db61265

…test.mk has also changed now

[icx] rerun tput tee for ggtt... there are still FPEs in float sse4...

e864477

./tput/teeThroughputX.sh -ggtt -flt -makej -makeclean

[icx] in ggtt.mad mgOnGpuVectors.h, avoid an FPE madgraph5#736 in a s…

82132e2

…qrt (I do not understand why it gives an FPE, honestly) This now fixes the FPTYPE=f AVX=sse4 runTest.exe on icx...

[icx] rerun tput tee for ggtt... an dthis finally succeeds with no FPEs!

0aca3cb

./tput/teeThroughputX.sh -ggtt -flt -makej -makeclean

[icx] in gg_tt.mad cudacpp.mk, switch off -g again

ee7cc15

Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736" This reverts commit e3af119.

[icx] backport to CODEGEN the latest ggtt.mad changes for madgraph5#736…

4c3de7a

… etc - and include formatting fixes

[icx] regenerate ggtt.mad, all ok with formatting fixes

1a25785

[icx] regenerate 6 other processes mad

80e3c9b

[icx] regenerate 7 processes sa

7a1c398

valassi added 7 commits July 26, 2023 11:43

[icx] go back to upstream/master logs for tput and tmad for easier me…

6c8c316

…rging

[icx] modify ggtt.mad cudacpp.mk and CODEGEN (and ggtt.mad) cudacpp_t…

3c31895

…est.mk to keep backward-compatibility to epoch1/epoch2 of gtest directory names madgraph5#125 and madgraph5#738

[icx] backport to CODEGEN cudacpp.mk from ggtt.mad

be6093a

[icx] regenerate ggtt.mad, all ok

a4f1fda

valassi linked an issue Jul 26, 2023 that may be closed by this pull request

FPE in icpx tests when gtest was built with a different compiler #738

Closed

valassi merged commit a4b9d6b into madgraph5:master Jul 26, 2023

valassi self-assigned this Jul 26, 2023

This was referenced Jul 26, 2023

Ensure clang and icc builds are also ok with the new ixx/oxx/vxx #724

Closed

Test builds with newer compilers (gcc13, clang16, icpx2023.2...) #734

Open

[WIP] GPU Abstraction to also target HIP in cudacpp impementation & Profiling infrastructure #718

Open

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 14, 2023

Merge remote-tracking branch 'upstream/master' into nobm (after mergi…

b694441

…ng icx madgraph5#737 into upstream/master)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx) #737

Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx) #737

Uh oh!

valassi commented Jul 25, 2023

Uh oh!

valassi commented Jul 25, 2023

Uh oh!

valassi commented Jul 26, 2023

Uh oh!

valassi commented Jul 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx) #737

Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx) #737

Uh oh!

Conversation

valassi commented Jul 25, 2023

Uh oh!

valassi commented Jul 25, 2023

Uh oh!

valassi commented Jul 26, 2023

Uh oh!

valassi commented Jul 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant