Skip to content

Commit c7b3dc0

Browse files
committed
[amd] in gq_ttq.mad HiprandRandomNumberKernel.cc, add debug printouts (commented out) for the memory corruption madgraph5#806
This shows an uninitialised value deep inside hiprand [valassia@nid005067 bash] ~/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux > valgrind ./check_hip.exe -p 1 8 1 ==105499== Memcheck, a memory error detector ==105499== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==105499== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info ==105499== Command: ./check_hip.exe -p 1 8 1 ==105499== ==105499== Warning: set address range perms: large range [0x59c90000, 0x159e91000) (noaccess) INFO: The following Floating Point Exceptions will cause SIGFPE program aborts: FE_DIVBYZERO, FE_INVALID, FE_OVERFLOW Get random numbers from Hiprand ==105499== Conditional jump or move depends on uninitialised value(s) ==105499== at 0x1253777C: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x12537F40: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x12540782: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x125629DD: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x4B825EB: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B88342: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B822FF: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B55120: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B2B590: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x49D84AF: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x49D87C4: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4A00FA2: hipMemcpy (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== ==105499== Conditional jump or move depends on uninitialised value(s) ==105499== at 0x12537B82: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x12537F40: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x12540782: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x125629DD: ??? (in /opt/rocm-6.0.3/lib/libhsa-runtime64.so.1.12.60003) ==105499== by 0x4B825EB: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B88342: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B822FF: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B55120: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4B2B590: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x49D84AF: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x49D87C4: ??? (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== by 0x4A00FA2: hipMemcpy (in /opt/rocm-6.0.3/lib/libamdhip64.so.6.0.60003) ==105499== Got random numbers from Hiprand ==105499== Invalid read of size 8 ==105499== at 0x21F741: std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, float> > >::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /pfs/lustrep3/scratch/project_465001114/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux/check_hip.exe) ==105499== by 0x21D0D1: mgOnGpu::TimerMap::start(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /pfs/lustrep3/scratch/project_465001114/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux/check_hip.exe) ==105499== by 0x215CBB: main (in /pfs/lustrep3/scratch/project_465001114/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux/check_hip.exe) ==105499== Address 0x1c00000043 is not stack'd, malloc'd or (recently) free'd ==105499== ==105499== ==105499== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==105499== Access not within mapped region at address 0x1C00000043 ==105499== at 0x21F741: std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, float> > >::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /pfs/lustrep3/scratch/project_465001114/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux/check_hip.exe) ==105499== by 0x21D0D1: mgOnGpu::TimerMap::start(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /pfs/lustrep3/scratch/project_465001114/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux/check_hip.exe) ==105499== by 0x215CBB: main (in /pfs/lustrep3/scratch/project_465001114/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gux_ttxux/check_hip.exe) ==105499== If you believe this happened as a result of a stack ==105499== overflow in your program's main thread (unlikely but ==105499== possible), you can try to increase the size of the ==105499== main thread stack using the --main-stacksize= flag. ==105499== The main thread stack size used in this run was 16777216. Unfortunately however also --common crashes (and gives the same uninitialised problem, whether related or not)
1 parent fe331ed commit c7b3dc0

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

epochX/cudacpp/gq_ttq.mad/SubProcesses/HiprandRandomNumberKernel.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,11 +117,13 @@ namespace mg5amcCpu
117117

118118
void HiprandRandomNumberKernel::generateRnarray()
119119
{
120+
//std::cout << "Get random numbers from Hiprand" << std::endl; // debug #806
120121
#if defined MGONGPU_FPTYPE_DOUBLE
121122
checkHiprand( hiprandGenerateUniformDouble( m_rnGen, m_rnarray.data(), m_rnarray.size() ) );
122123
#elif defined MGONGPU_FPTYPE_FLOAT
123124
checkHiprand( hiprandGenerateUniform( m_rnGen, m_rnarray.data(), m_rnarray.size() ) );
124125
#endif
126+
//std::cout << "Got random numbers from Hiprand" << std::endl; // debug #806
125127
/*
126128
printf( "\nHiprandRandomNumberKernel::generateRnarray size = %d\n", (int)m_rnarray.size() );
127129
fptype* data = m_rnarray.data();

0 commit comments

Comments
 (0)