Skip to content

Conversation

@Jooorgen
Copy link
Owner

@Jooorgen Jooorgen commented Aug 9, 2023

No description provided.

valassi added 30 commits July 17, 2023 08:45
…_ZERO (see firemodels/fds/issues/5638 on gh) with -ffpe flags

However, the build gives this warning
  ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++  -O3  -std=c++17 -I. -I../../src -I../../../../../test/googletest/install/include -I../../../../../test/googletest/install/include -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -ffpe-trap=invalid,zero,overflow -ffpe-summary=none  -fPIC -c testxxx.cc -o testxxx.o
  cc1plus: warning: command-line option ‘-ffpe-trap=invalid,zero,overflow’ is valid for Fortran but not for C++
  cc1plus: warning: command-line option ‘-ffpe-summary=none’ is valid for Fortran but not for C++
I will revert
Revert "[fpe] in ggttsa cudacpp.mk, try to debug madgraph5#701 IEEE_DIVIDE_BY_ZERO (see firemodels/fds/issues/5638 on gh) with -ffpe flags"
This reverts commit d75e426.
…als to debug madgraph5#701 (see https://stackoverflow.com/a/17473528)

This works as expected:
  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe --gtest_filter=*xxx
  Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
  Note: Google Test filter = *xxx
  [==========] Running 2 tests from 2 test suites.
  [----------] Global test environment set-up.
  [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
  [ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
  Floating point exception (core dumped)
…signal handler for madgraph5#701

  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> make -j AVX=512y
  ...
  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe --gtest_filter=*xxx
  Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
  Note: Google Test filter = *xxx
  [==========] Running 2 tests from 2 test suites.
  [----------] Global test environment set-up.
  [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
  [ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
  Floating Point Exception (CPU neppV=4): 'ipzxxx'
…CPP_RUNTIME_DISABLEFPE is set

Note: as observed last week, a debug build triggers an FPE exception already in ixxxxx

[avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe
Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
Floating Point Exception (CPU neppV=4): 'ixxxxx'

Conversely, in the same debug build, disabling FPEs with the env variable gives a successful test

[avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> CUDACPP_RUNTIME_DISABLEFPE=1 ./runTest.exe
Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
[       OK ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx (0 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX (0 ms total)

[----------] 1 test from SIGMA_SM_GG_TTX_CPU_MISC
[ RUN      ] SIGMA_SM_GG_TTX_CPU_MISC.testmisc
[       OK ] SIGMA_SM_GG_TTX_CPU_MISC.testmisc (0 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_MISC (0 ms total)

[----------] 1 test from SIGMA_SM_GG_TTX_CPU/MadgraphTest
[ RUN      ] SIGMA_SM_GG_TTX_CPU/MadgraphTest.CompareMomentaAndME/0
INFO: Opening reference file ../../test/ref/dump_CPUTest.Sigma_sm_gg_ttx.txt
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
[       OK ] SIGMA_SM_GG_TTX_CPU/MadgraphTest.CompareMomentaAndME/0 (34 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU/MadgraphTest (34 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 3 test suites ran. (35 ms total)
[  PASSED  ] 3 tests.
No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled
…et cast)

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled
…handler).

This also includes a resetHstMomentaToPar0, which is commented out for the moment.
The idea was to modify the momenta befaore each xxx call, to ensure that they are all consistent.
But I will instead implement a more solid fix.

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled
In debug mode this fails like this

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
IXXXXX: sqp0p3={ -0, -0, -0, -0 }
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0

Note: last week the sqp0p3 were not all 0. I am not sure what I was doing (I was using hstReset?).
Anyway: I will revert this commit an dthe previous one. We need a much more solid fix in all xxx functions.
…l start from scratch

Revert "[fpe] in ggtt.sa HelAmps_sm.h, add some debugging printouts for ixxxxx"
This reverts commit fdacc5e

Revert "[fpe] in ggtt.sa HelAmps_sm.h, first (OLD!) attempt of BUG FIX FOR madgraph5#701 in function ixxxxx"
This reverts commit 7674824.
The build fails because maskand is also defined in testmisc.cc
Thiw now shows (in debug builds) that the first tests executed is ixxxxx and it immediately fails with FPE

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx

nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0
…ptype& r )" to create cx vectors from fp scalars
…ion ixxxxx

This builds and runs ok. The FPE (always in debug mode) is now moved from ixxxxx to the next ipzxxx

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Floating Point Exception (CPU neppV=4): 'ipzxxx' ievt=0
…ginning of each test (prepare to modify momenta for ipzxxx)

No change in runTest behaviour, FPEs by default in ipzxxx, succeeds if FPEs disabled
…respecting the relevant assumptions

Assumption example for ipzxxx: (FMASS == 0) and (PX == PY == 0 and E == +PZ > 0)

This is done by testing one ievt and copying all momenta to that ievt

NB: after adding the woraround for ipzxxx, now the test fails in vxxxxx, which is the real issue in madgraph5#701
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'vxxxxx' ievt=0
…ion vxxxxx

This builds and runs ok. The FPE (always in debug mode) is now moved from vxxxxx to the next oxxxxx

Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Prepare test sxxxxx ievt=0
Prepare test oxxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'oxxxxx' ievt=0
NB1: This also adds LIBFLAGS to link command for shared libraries
This is needed to avoid "hidden symbol `__gcov_init' in ...libgcov.a(_gcov.o) is referenced by DSO" errors

NB2: I will not add a gcov target to .mad makefiles (they have no debug target either yet)
…make clean'

Revert "[fpe] in ggt.sa .gitignore, add gcov suffixes to gitignore"
This reverts commit eb5594d.
…I want to profile is in template header HelAmps_sm.h and I am unable to show it

Revert "[fpe] in ggtt.sa cudacpp makefiles, remove files with gcov suffixes in 'make clean'"
This reverts commit fc120fa.

Revert "[fpe] in ggtt.sa cudacpp makefiles, add gcov target"
This reverts commit 709ec5d.
…ion oxxxxx

This builds ok. The FPE (always in debug mode) is now moved from oxxxxx to the next opzxxx

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Prepare test sxxxxx ievt=0
Prepare test oxxxxx ievt=0
Prepare test opzxxx ievt=0
Floating Point Exception (CPU neppV=4): 'opzxxx' ievt=0

HOWEVER, I introduced a functional bug in oxxxxx - the test fails if I disable FPEs
It builds, but the tests still fail

NB: there are two different sets of ip and im whether pp=0 or pp>0 in oxxxxx!
(And I should also check ixxxxx)
valassi and others added 29 commits July 26, 2023 00:00
Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736"
This reverts commit 9a5a5bc.
./tput/teeThroughputX.sh -ggtt -flt -makej -makeclean
…bug FPE madgraph5#736

make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=sse4

Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 6 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
Floating Point Exception (CPU neppV=4): 'unknown' ievt=-1

(gdb) where
 0  0x00000000004173f8 in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:133
 1  0x00000000004c6ffc in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
… do nt understand why it gives an FPE, honestly)

Now another FPE in sse4 moves again in ixxx...

make cleanall; make -j -f cudacpp.mk FPTYPE=f AVX=sse4

Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 6 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0

(gdb) where
 0  0x0000000000411a60 in mg5amcCpu::fpsqrt(float __vector(4) const volatile&) (v=...) at ../../src/mgOnGpuVectors.h:244
 1  mg5amcCpu::ixxxxx<mg5amcCpu::KernelAccessMomenta<false>, mg5amcCpu::KernelAccessWavefunctions<false> > (
    momenta=momenta@entry=0x101b8c0, fmass=<optimized out>, nhel=nhel@entry=1, nsf=nsf@entry=-1,
    wavefunctions=wavefunctions@entry=0x7fffffff9de0, ipar=ipar@entry=0) at ../../src/HelAmps_sm.h:288
 2  0x000000000043f30f in SIGMA_SM_GG_TTX_CPU_XXX_testxxx_Test::TestBody (this=<optimized out>) at testxxx.cc:340
…qrt (I do not understand why it gives an FPE, honestly)

This now fixes the FPTYPE=f AVX=sse4 runTest.exe on icx...
./tput/teeThroughputX.sh -ggtt -flt -makej -makeclean
Revert "[icx] in gg_tt.mad cudacpp.mk, switch on -g (while keeping -O3) to debug FPE madgraph5#736"
This reverts commit e3af119.
…nt Exception" errors have disappeared

STARTED  AT Wed Jul 26 01:32:01 AM CEST 2023
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Wed Jul 26 05:31:18 AM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Wed Jul 26 05:58:22 AM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Wed Jul 26 06:12:34 AM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Wed Jul 26 06:16:54 AM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Wed Jul 26 06:21:13 AM CEST 2023 [Status=0]

Example diff:
-Floating Point Exception (CPU neppV=4): 'unknown' ievt=-1
+[  PASSED  ] 6 tests.

There is some degradation of performance, but only for simple 2->2 processes.
For more complex processes, performance is essentially the same.

Somewhat surprisingly, double (double FP) results do not seem to be affected?
Only float (single FP) results seem to show some difference in performance and disassembly symbols?
STARTED AT Wed Jul 26 06:25:39 AM CEST 2023
ENDED   AT Wed Jul 26 10:43:48 AM CEST 2023

Status=0

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt

There is maybe a tiny degradation of performance, but only for simpler physics processes
…est.mk to keep backward-compatibility to epoch1/epoch2 of gtest directory names madgraph5#125 and madgraph5#738
…h1/epoch2 fixes to the other 13 processes

for f in $(gitls */SubProcesses/cudacpp.mk); do \cp gg_tt.mad/SubProcesses/cudacpp.mk $f; done
for f in $(gitls */test/cudacpp_test.mk); do \cp gg_tt.mad/test/cudacpp_test.mk $f; done
Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx)
Revert "[jthip] regenerate ggttgg.mad after merging upstream/master - all ok (will revert the log)"
This reverts commit 9d5b6d9.
@Jooorgen Jooorgen merged commit 9bb4d3f into master Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants