Skip to content

Conversation

@valassi
Copy link
Member

@valassi valassi commented Oct 3, 2024

This is a WIP PR with a workaround for FPE in vxxxxx on HIP #1011

It is WIP because

@valassi valassi self-assigned this Oct 3, 2024
@valassi valassi requested a review from a team as a code owner October 3, 2024 13:34
@valassi valassi linked an issue Oct 3, 2024 that may be closed by this pull request
@valassi valassi marked this pull request as draft October 3, 2024 13:34
@valassi
Copy link
Member Author

valassi commented Oct 4, 2024

  • must first release 1.00.00

this is done, I have resynced this to the latest master

I also fixed #1013 an dincluded this here

  • must backport to codegen, regenerate code, run all tests

codegen backported, I regenerated all processes

I will run all tests later on

…) with the workaround for HIP FPEs madgraph5#1011 - now all tests succeed

./tput/allTees.sh -hip

STARTED  AT Fri 04 Oct 2024 09:31:32 AM EEST
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean  -nocuda
ENDED(1) AT Fri 04 Oct 2024 10:33:14 AM EEST [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean  -nocuda
ENDED(2) AT Fri 04 Oct 2024 11:09:17 AM EEST [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean  -nocuda
ENDED(3) AT Fri 04 Oct 2024 11:17:27 AM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst  -nocuda
ENDED(4) AT Fri 04 Oct 2024 11:19:15 AM EEST [Status=0]
SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common  -nocuda'
ENDED(5) AT Fri 04 Oct 2024 11:19:15 AM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common  -nocuda
ENDED(6) AT Fri 04 Oct 2024 11:21:02 AM EEST [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean  -nocuda
ENDED(7) AT Fri 04 Oct 2024 11:53:25 AM EEST [Status=0]

No errors found in logs

No FPEs or '{ }' found in logs

eemumu MEK (channelid array) processed 512 events across 2 channels { 1 : 256, 2 : 256 }
eemumu MEK (no multichannel) processed 512 events across 2 channels { no-multichannel : 512 }
ggttggg MEK (channelid array) processed 512 events across 1240 channels { 1 : 32, 2 : 32, 4 : 32, 5 : 32, 7 : 32, 8 : 32, 14 : 32, 15 : 32, 16 : 32, 18 : 32, 19 : 32, 20 : 32, 22 : 32, 23 : 32, 24 : 32, 26 : 32 }
ggttggg MEK (no multichannel) processed 512 events across 1240 channels { no-multichannel : 512 }
ggttgg MEK (channelid array) processed 512 events across 123 channels { 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32, 17 : 32 }
ggttgg MEK (no multichannel) processed 512 events across 123 channels { no-multichannel : 512 }
ggttg MEK (channelid array) processed 512 events across 16 channels { 1 : 64, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32 }
ggttg MEK (no multichannel) processed 512 events across 16 channels { no-multichannel : 512 }
ggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
ggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
gqttq MEK (channelid array) processed 512 events across 5 channels { 1 : 128, 2 : 96, 3 : 96, 4 : 96, 5 : 96 }
gqttq MEK (no multichannel) processed 512 events across 5 channels { no-multichannel : 512 }
heftggbb MEK (channelid array) processed 512 events across 4 channels { 1 : 128, 2 : 128, 3 : 128, 4 : 128 }
heftggbb MEK (no multichannel) processed 512 events across 4 channels { no-multichannel : 512 }
smeftggtttt MEK (channelid array) processed 512 events across 72 channels { 1 : 32, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32 }
smeftggtttt MEK (no multichannel) processed 512 events across 72 channels { no-multichannel : 512 }
susyggt1t1 MEK (channelid array) processed 512 events across 6 channels { 2 : 128, 3 : 96, 4 : 96, 5 : 96, 6 : 96 }
susyggt1t1 MEK (no multichannel) processed 512 events across 6 channels { no-multichannel : 512 }
susyggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
susyggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
…ge (heft fails madgraph5#833, skip ggttggg madgraph5#933)

./tmad/allTees.sh -hip

STARTED  AT Fri 04 Oct 2024 11:53:26 AM EEST
(SM tests)
ENDED(1) AT Fri 04 Oct 2024 02:12:45 PM EEST [Status=0]
(BSM tests)
ENDED(1) AT Fri 04 Oct 2024 02:22:24 PM EEST [Status=0]

16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
12 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
12 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
12 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt
1 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt
16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

eemumu MEK processed 81920 events across 2 channels { 1 : 81920 }
eemumu MEK processed 8192 events across 2 channels { 1 : 8192 }
ggttggg MEK processed 81920 events across 1240 channels { 1 : 81920 }
ggttggg MEK processed 8192 events across 1240 channels { 1 : 8192 }
ggttgg MEK processed 81920 events across 123 channels { 112 : 81920 }
ggttgg MEK processed 8192 events across 123 channels { 112 : 8192 }
ggttg MEK processed 81920 events across 16 channels { 1 : 81920 }
ggttg MEK processed 8192 events across 16 channels { 1 : 8192 }
ggtt MEK processed 81920 events across 3 channels { 1 : 81920 }
ggtt MEK processed 8192 events across 3 channels { 1 : 8192 }
gqttq MEK processed 81920 events across 5 channels { 1 : 81920 }
gqttq MEK processed 8192 events across 5 channels { 1 : 8192 }
heftggbb MEK processed 81920 events across 4 channels { 1 : 81920 }
heftggbb MEK processed 8192 events across 4 channels { 1 : 8192 }
smeftggtttt MEK processed 81920 events across 72 channels { 1 : 81920 }
smeftggtttt MEK processed 8192 events across 72 channels { 1 : 8192 }
susyggt1t1 MEK processed 81920 events across 6 channels { 3 : 81920 }
susyggt1t1 MEK processed 8192 events across 6 channels { 3 : 8192 }
susyggtt MEK processed 81920 events across 3 channels { 1 : 81920 }
susyggtt MEK processed 8192 events across 3 channels { 1 : 8192 }
Revert "[amd] rerun 30 tmad tests on LUMI worker node (small-g 72h) - no change (heft fails madgraph5#833, skip ggttggg madgraph5#933)"
This reverts commit 07c2a53.

Revert "[amd] rerun 96 tput builds and tests on LUMI worker node (small-g 72h) with the workaround for HIP FPEs madgraph5#1011 - now all tests succeed"
This reverts commit 0ec8c1c.
@valassi valassi changed the title WIP workaround for FPE in vxxxxx on HIP workaround for FPE in vxxxxx on HIP Oct 4, 2024
@valassi valassi marked this pull request as ready for review October 4, 2024 15:15
@valassi
Copy link
Member Author

valassi commented Oct 4, 2024

Hi @oliviermattelaer this is also ready for merging, it fixes some FPEs on HIP GPUs

(And it includes #1014)

Can you approve please? Thanks Andrea

Copy link
Member

@oliviermattelaer oliviermattelaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect thanks

@valassi valassi changed the title workaround for FPE in vxxxxx on HIP workaround for FPE in vxxxxx on HIP (and fixes for v1.00.01 tags) Oct 4, 2024
@valassi
Copy link
Member Author

valassi commented Oct 4, 2024

Very good @oliviermattelaer thanks!
Merging now
Andrea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants