Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (Fix for #2156) #2157

benkirk · 2025-01-17T20:52:55Z

Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow.

TYPE: bug fix

KEYWORDS: prevent displacements overflow in MPI_Gatherv() and MPI_Scatterv() operations

SOURCE: Benjamin Kirk & Negin Sobhani (NSF NCAR / CISL)

DESCRIPTION OF CHANGES:
Problem:
The MPI_Gatherv() and MPI_Scatterv() operations require integer displacements into the communications buffers. Historically everything is passed as an MPI_CHAR, causing these displacements to be larger than otherwise necessary. For large domain sizes this can cause the displace[] offsets to exceed the maximum int, wrapping to negative values.

Solution:
This change introduces additional error checking and then uses the function MPI_Type_match_size() (available since MPI-2.0) to determine a suitable MPI_Datatype given the input *typesize. The result then is that the displace[] offsets are in terms of data type extents, rather than bytes, and less likely to overflow.

ISSUE: Fixes #2156

LIST OF MODIFIED FILES:
M frame/collect_on_comm.c

TESTS CONDUCTED:
Failed cases run now.

RELEASE NOTE:
Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow.

…displacements overflow. The MPI_Gatherv() and MPI_Scatterv() operations require integer displacements into the communications buffers. Historically everything is passed as an MPI_CHAR, causing these displacements to be larger than otherwise necessary. For large domain sizes this can cause the displace[] offsets to exceed the maximum int, wrapping to negative values. This change introduces additional error checking and then uses the function MPI_Type_match_size() (available since MPI-2.0) to determine a suitable MPI_Datatype given the input *typesize. The result then is that the displace[] offsets are in terms of data type extents, rather than bytes, and less likely to overflow.

benkirk · 2025-01-17T22:21:17Z

Just for awareness, I can't see the output of the failed WRF-BUILD-2690; I get a timeout accessing
https://ncar_jenkins.scalacomputing.com/job/WRF-Feature-Regression-Test/2690/console

dudhia · 2025-01-17T22:34:31Z

Common problem - it will be provided by message later

…

On Fri, Jan 17, 2025 at 3:21 PM Benjamin S. Kirk ***@***.***> wrote: Just for awareness, I can't see the output of the failed WRF-BUILD-2690; I get a timeout accessing https://ncar_jenkins.scalacomputing.com/job/WRF-Feature-Regression-Test/2690/console — Reply to this email directly, view it on GitHub <#2157 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEIZ77CVDAXTOXJBOYNIOZ32LF67RAVCNFSM6AAAAABVMWK7MSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJZGMYDONRXGA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Fixes runtime failures caught by CI in in the underlying MPI_Gatherv().

Of course dtype needs to be MPI_Datatype, not an int. This error sneaked through MPICH-based tests but not OpenMPI. Hopefully this change will address previous CI failures.

weiwangncar · 2025-01-19T00:43:41Z

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

weiwangncar · 2025-01-19T02:44:42Z

@benkirk Thanks for the fix! I tested it for a few cases we could run before and it is working now.

benkirk · 2025-01-21T16:51:54Z

Thanks for the success report @weiwangncar, happy to help!

dudhia · 2025-01-22T16:47:28Z

I will tag @mgduda and @islas to review this.

islas

@benkirk Thanks for the fix! This has been a heck of a problem to hunt down.

If my understanding is correct the overflow can still manifest, just with a domain approximately 4x in size than what triggered it before, right? Though the error will now be caught and aborted at the appropriate location.

benkirk · 2025-01-23T01:52:45Z

That's right, the displace[] buffer is now in terms of array elements, not bytes. So for REAL*4 or integer arrays the overflows can occur still occur, but at ~4x larger problem sizes than before. And in these cases the overflow error will be caught in the routine so at least the source of the issue is identified.

(There's some potential fallbacks even in that case I can think of but haven't implemented. Those fallbacks would require some conditions on the buffers to be sent, like evenly divisible by some integer factor. Such tricks are necessary when trying to send buffers >2GB with MPI.)

@cenlinhe

# WRF Version v4.7.0 The WRF model has been update to Version v4.7.0 on April 25, 2025 __Acknowledgements__: We would like to thank * Adam Dury (WeatherQuest) * Andrea Zonato, Royal Netherlands Meteorological Institute (KNMI) * Benjamin Kirk & Negin Sobhani (NSF NCAR / CISL) * Cenlin He @cenlinhe and Tzu-Shun Lin (NCAR) * Charlie Li, software developer from lakes environmental, Canada * Jakub Lewandowski (University of Leeds) * James Ruppert (University of Oklahoma) * Joseph Olson (NOAA/GSL) * Alexander Ukhov (KAUST) * L. Fita (UBA/CIMA/IFAECI) * Lukas Pilz (Heidelberg University) * Martilli, Alberto (CIEMAT) * Mathieu Landreau (Centrale Nantes) * Robert Conrick (U. of Washington); [email protected] * Robert Gilliam & Jon Pleim, US EPA * Sergey Osipov (KAUST) * Tanya Spero (U.S. EPA) * Ted Mansell (NOAA/NSSL) (@MicroTed ) for their contributions to this release. ## Physics * Fix an error associated with using LCZ categories in NoahMP. Prior to this fix, the LCZs were not correctly referenced, hence ignored in the NoahMP code. (#2202) [Details](a176a5965) * NSSL-mp bug fix for (obsolete) droplet nucleation (#2195) [Details](30c03dc40) * NSSL microphysics scheme updates include 1. An explicit rain breakup for 3-moment rain (addresses issue of cold pools being too warm and drops being too large in rain cores), 2. Improved reflectivity conservation for graupel->hail conversion and drop freezing, 3. More accurate saturation mixing ratio calculation, 4. New default droplet nucleation that controls excess supersaturation much better than previously (and default is to always predict the number of activated CCN). The update has been submitted to CCPP repository as well. (#2170) [Details](9d763af90) * An new microphysics, UFS Double Moment (UDM), 7-class microphysics from Songyou Hong is added (mp_physics=27). UDM mp largely adopts microphysical processes in WDM7, but with bug fixes or revisions based on literature and accumulated realism. UDM mp utilizes the in-cloud microphysics concept (Kim and Hong 2018), with the addition of water-friendly aerosols for CCN initialization. Semi-lagrangian sedimentation of Juang and Hong (2010) is also re-configured for computational efficiency and numerical accuracy. All production terms are optimized by introducing a cloud-top definition for hydrometeors. (#2147) [Details](5fc76c540) * Release of the RCON Microphysics package into WRF, which improves upon the warm rain representation of the Thompson-Eidhammer scheme. RCON is based heavily on the Thompson-Eidhammer scheme with a couple significant changes that improve upon the code in module_mp_rcon.F to generate more realistic rainfall during warm rain events with additional benefits for cold rain, especially warm processes during cold rain events. Among the most significant changes for rain productions are 1) the use of a wider cloud water DSD of lognormal shape instead of the gamma DSD used by the Thompson-Eidhammer parameterization and 2) enhancement of the cloud-to-rain autoconversion parameterization to accommodate the new shape. The changes here also allow for sedimentation of cloud water within the lowest model layer, which effectively creates a drizzle mode in the scheme. Accompanying published reference: Conrick, R., C. F. Mass, and L. McMurdie, 2023: Improving Simulations of Warm Rain in a Bulk Microphysics Scheme. Mon. Wea. Rev., 152, 169-185, https://doi.org/10.1175/MWR-D-23-0035.1. (#2144) [Details](de213c920) * Fix an erroneous print for using ghg_input when no radiation option is selected, mostly from idealized cases. (#2199) [Details](bd4ecbe01) * Fix a loop index error in bep_bem urban code. (#2196) [Details](0171299d3) * Noah-MP bug fixes for (1) allowing BATS snow albedo scheme for nighttime snow aging, (2) the potential leakage caused by calculate_soil variable during parallelized run, (3) the missing of HCPCT output for glacier points. (#2160) [Details](fd079bf48) * The similarity stability functions phim and phieps, necessary for calculating the surface values of tke and dissipation rate in the tke-epsilon-tpe PBL scheme [Zonato et al., 2022](https://doi.org/10.1175/MWR-D-21-0299.1) have been updated considering the correction term accounting for the roughness length z0. No relevant differences are found in temperature, wind speed, and humidity. Regarding turbulence variables, the stable case has just negligible differences, while the unstable case shows higher values of TKE and dissipation rate and lower values of temperature variance. (#2120) [Details](70855a73e) * Pleim-Xiu LSM is now compatible with 61 category MODIS LCZ landuse dataset. A mode of latent heat effects on Tg from vegetated parts and from wet leaves is added to Pleim-Xiu LSM. (#2023) [Details](b7f31dcde) ## Software * Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (#2157) [Details](af8101493) * CMake README documentation on <PackageName>_ROOT variables (#2190) [Details](33036d613) * CMake README documentation typo fixes (#2189) [Details](3fd1aefda) * Fix aarch64 GCC build when DM configuration selected (#2192) [Details](8e1d6742c) * Fixed failed compilation with Intel oneAPI by reworking the dependency linking of hydro CMake compilation (#2178) [Details](2e0694f14) * Fix compilation of grib2 IO in make build (#2191) [Details](2639dcd3f) * Fix uncontrollable building of external/io_netcdfpar folder for all stanzas (#2181) [Details](127a8f40a) * Suppress MYNN-EDMF verify checkout command (#2188) [Details](3f2465b41) * Fix typo in confcheck CMakeLists.txt for FSEEKO (#2179) [Details](2572bc5f5) * Add quotes to optimization flags exceptions in CMake (#2180) [Details](b15e341e4) * CMake Chem and Chem+KPP Build (#2018) [Details](b26e64595) * Consistent double precision definitions (#2099) [Details](704259871) * CMake Fix split command flags to be correctly populated (#2108) [Details](5b09725f5) * CMake WRFPLUS (#2089) [Details](695f455e8) * Override CMake-injected optimization flags in favor of the flags set by the build system and provided stanza information. (#2138) [Details](b6542b0f7) * Fixed CMake dev warning `project() should be called prior to this enable_language() call` appearing when using `configure_new` script with some newer versions of Cmake. (#2125) [Details](0ccba14eb) * Add documentation to custom properties in CMake to fix compatibility with older versions. (#2131) [Details](f204246a0) * Remove leading -D on defines during stanza reading to allow older versions of CMake to configure properly. (#2130) [Details](c2e121f56) * Bug fix in CMake FindnetCDF.cmake for empty --has-* nc-config fields (#2135) [Details](f096921b2) * CMake confcheck switch to try_* functions (#2090) [Details](5dd2c192d) * Bug fix in landread.c to address undefined behavior by adding an explicit return statement in `GET_LANDUSE()` function (#2197) [Details](5ef63ba34) * Fix memory leaks related to arrays being allocated without being deallocated in start_em and time series calculation subroutines. (#2139) [Details](94aa27a7e) * Fix an access violation error when a PGI compiler is used with urban variables in module_bl_ysu.F when urban option is turned off and the memories of those arrays are not available. (#2137) [Details](33ce70c0f) * Updated grav_settling code to better recognize the land use type so it doesn't crash. Also update the error message if it does crash to go into the rsl.error files rather than rsl.out files. (#2110) [Details](b3eebb3fe) * Bug fix for wrfinput where LCZ urban cells in LU_INDEX were overwritten with default USGS urban category. (#2153) [Details](d96478d4f) * Add manage_externals tool to access physics modules in MMM-physics git repository. (#2126) [Details](7195dc250) * Submodule implementation of the MYNN-EDMF (https://github.com/NCAR/MYNN-EDMF). The module names changed from *_mynn_* to *_mynnedmf_* to resolve a version conflict in MPAS. This version was originally developed within FV3/CCPP for RRFSv1, but has been refactored (to a k-only scheme) resulting in a speed-up of about 10-15% and it has since been tuned to better perform in MPAS and WRF compared to previous versions which were primarily developed for use in FV3. (#2148) [Details](383476531) * When the namelist option write_hist_at_0h_rst is set to .true. under &time_control, history write-out will now be conducted for the first time step for both the 0th stream (wrfout* files) and any special user-defined streams being implemented. (#2133) [Details](61d1c84cb) ## Dynamics * Corrected algorithms in the tipping bucket for precipitation and in the nudging routines to adjust for imprecision in single-precision real numbers exceeding the resolvable values in long (>23-year) continuous simulations. (#2063) [Details](a32188308) ## Data Assimilation * This PR adds an incremental analysis update capability. In the DA code, code is added to write out analysis increments for all variables in WRF netCDF format using auxiliary history output stream #5. In the model, analysis increments are divided by the number of time steps in a specified time window and added to the model similar to physics tendencies. The input stream for the model is 15. The capability is turned on by adding iau = 1 and iau_time_window_sec in &time_control. The way the increments are added to the model is similar to what described by the paper by Chen et al. (https://doi-org.cuucar.idm.oclc.org/10.1175/WAF-D-22-0127.1). (#2151) [Details](6741f010e) ## Chemistry * Bug fix in the calculation of optical properties. Mass redistribution between GOCART dust/sea salt and MOZAIC bins was corrected. It slightly increased (by 3-5%) the aerosol optical depth (AOD). (#2112) [Details](bb791e73d) * Fix a bug where TUV and FTUV fail to initialize the distance to the Sun properly if the simulation starts on 1 Jan. (#2171) [Details](9aa3979f0) ## Hydro * In `hydro.namelist` adding lake_opt to namelist, reservoirs to own namelist. Support for lakes (reservoirs) in non-UDMP reach-based routing added and some style guide cleanup completed. (#2146) [Details](6d1db68f6) * Hydro reservoir drainage area (DA) lake option bugfix (#2182) [Details](313834d41) ## Miscellaneous * Update README.namelist file (#2193) [Details](7053a6ae9) * A namelist option, default_soiltype, is added to define filled-in land category along water/land boundaries where soil data may be missing in program real. (#2166) [Details](2f68d7b70) * Add dzstretch_u and dzbot in namelist.input. Users are advised to check UG for other parameters to use. (#2165) [Details](89ba5181b) * Noah-MP code tag is updated to corresponding to WRFV4.7 release. (#2207) [Details](f11e38164) * Fixed defs for adap time step namelist vars in README.namelist (#2158) [Details](30a16a1ce)

@cenlinhe

# WRF Version v4.7.0 The WRF model has been update to Version v4.7.0 on April 25, 2025 __Acknowledgements__: We would like to thank * Adam Dury (WeatherQuest) * Andrea Zonato, Royal Netherlands Meteorological Institute (KNMI) * Benjamin Kirk & Negin Sobhani (NSF NCAR / CISL) * Cenlin He @cenlinhe and Tzu-Shun Lin (NCAR) * Charlie Li, software developer from lakes environmental, Canada * Jakub Lewandowski (University of Leeds) * James Ruppert (University of Oklahoma) * Joseph Olson (NOAA/GSL) * Alexander Ukhov (KAUST) * L. Fita (UBA/CIMA/IFAECI) * Lukas Pilz (Heidelberg University) * Martilli, Alberto (CIEMAT) * Mathieu Landreau (Centrale Nantes) * Robert Conrick (U. of Washington); [email protected] * Robert Gilliam & Jon Pleim, US EPA * Sergey Osipov (KAUST) * Tanya Spero (U.S. EPA) * Ted Mansell (NOAA/NSSL) (@MicroTed ) for their contributions to this release. ## Physics * Fix an error associated with using LCZ categories in NoahMP. Prior to this fix, the LCZs were not correctly referenced, hence ignored in the NoahMP code. (#2202) [Details](a176a5965) * NSSL-mp bug fix for (obsolete) droplet nucleation (#2195) [Details](30c03dc40) * NSSL microphysics scheme updates include 1. An explicit rain breakup for 3-moment rain (addresses issue of cold pools being too warm and drops being too large in rain cores), 2. Improved reflectivity conservation for graupel->hail conversion and drop freezing, 3. More accurate saturation mixing ratio calculation, 4. New default droplet nucleation that controls excess supersaturation much better than previously (and default is to always predict the number of activated CCN). The update has been submitted to CCPP repository as well. (#2170) [Details](9d763af90) * An new microphysics, UFS Double Moment (UDM), 7-class microphysics from Songyou Hong is added (mp_physics=27). UDM mp largely adopts microphysical processes in WDM7, but with bug fixes or revisions based on literature and accumulated realism. UDM mp utilizes the in-cloud microphysics concept (Kim and Hong 2018), with the addition of water-friendly aerosols for CCN initialization. Semi-lagrangian sedimentation of Juang and Hong (2010) is also re-configured for computational efficiency and numerical accuracy. All production terms are optimized by introducing a cloud-top definition for hydrometeors. (#2147) [Details](5fc76c540) * Release of the RCON Microphysics package into WRF, which improves upon the warm rain representation of the Thompson-Eidhammer scheme. RCON is based heavily on the Thompson-Eidhammer scheme with a couple significant changes that improve upon the code in module_mp_rcon.F to generate more realistic rainfall during warm rain events with additional benefits for cold rain, especially warm processes during cold rain events. Among the most significant changes for rain productions are 1) the use of a wider cloud water DSD of lognormal shape instead of the gamma DSD used by the Thompson-Eidhammer parameterization and 2) enhancement of the cloud-to-rain autoconversion parameterization to accommodate the new shape. The changes here also allow for sedimentation of cloud water within the lowest model layer, which effectively creates a drizzle mode in the scheme. Accompanying published reference: Conrick, R., C. F. Mass, and L. McMurdie, 2023: Improving Simulations of Warm Rain in a Bulk Microphysics Scheme. Mon. Wea. Rev., 152, 169-185, https://doi.org/10.1175/MWR-D-23-0035.1. (#2144) [Details](de213c920) * Fix an erroneous print for using ghg_input when no radiation option is selected, mostly from idealized cases. (#2199) [Details](bd4ecbe01) * Fix a loop index error in bep_bem urban code. (#2196) [Details](0171299d3) * Noah-MP bug fixes for (1) allowing BATS snow albedo scheme for nighttime snow aging, (2) the potential leakage caused by calculate_soil variable during parallelized run, (3) the missing of HCPCT output for glacier points. (#2160) [Details](fd079bf48) * The similarity stability functions phim and phieps, necessary for calculating the surface values of tke and dissipation rate in the tke-epsilon-tpe PBL scheme [Zonato et al., 2022](https://doi.org/10.1175/MWR-D-21-0299.1) have been updated considering the correction term accounting for the roughness length z0. No relevant differences are found in temperature, wind speed, and humidity. Regarding turbulence variables, the stable case has just negligible differences, while the unstable case shows higher values of TKE and dissipation rate and lower values of temperature variance. (#2120) [Details](70855a73e) * Pleim-Xiu LSM is now compatible with 61 category MODIS LCZ landuse dataset. A mode of latent heat effects on Tg from vegetated parts and from wet leaves is added to Pleim-Xiu LSM. (#2023) [Details](b7f31dcde) ## Software * Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (#2157) [Details](af8101493) * CMake README documentation on <PackageName>_ROOT variables (#2190) [Details](33036d613) * CMake README documentation typo fixes (#2189) [Details](3fd1aefda) * Fix aarch64 GCC build when DM configuration selected (#2192) [Details](8e1d6742c) * Fixed failed compilation with Intel oneAPI by reworking the dependency linking of hydro CMake compilation (#2178) [Details](2e0694f14) * Fix compilation of grib2 IO in make build (#2191) [Details](2639dcd3f) * Fix uncontrollable building of external/io_netcdfpar folder for all stanzas (#2181) [Details](127a8f40a) * Suppress MYNN-EDMF verify checkout command (#2188) [Details](3f2465b41) * Fix typo in confcheck CMakeLists.txt for FSEEKO (#2179) [Details](2572bc5f5) * Add quotes to optimization flags exceptions in CMake (#2180) [Details](b15e341e4) * CMake Chem and Chem+KPP Build (#2018) [Details](b26e64595) * Consistent double precision definitions (#2099) [Details](704259871) * CMake Fix split command flags to be correctly populated (#2108) [Details](5b09725f5) * CMake WRFPLUS (#2089) [Details](695f455e8) * Override CMake-injected optimization flags in favor of the flags set by the build system and provided stanza information. (#2138) [Details](b6542b0f7) * Fixed CMake dev warning `project() should be called prior to this enable_language() call` appearing when using `configure_new` script with some newer versions of Cmake. (#2125) [Details](0ccba14eb) * Add documentation to custom properties in CMake to fix compatibility with older versions. (#2131) [Details](f204246a0) * Remove leading -D on defines during stanza reading to allow older versions of CMake to configure properly. (#2130) [Details](c2e121f56) * Bug fix in CMake FindnetCDF.cmake for empty --has-* nc-config fields (#2135) [Details](f096921b2) * CMake confcheck switch to try_* functions (#2090) [Details](5dd2c192d) * Bug fix in landread.c to address undefined behavior by adding an explicit return statement in `GET_LANDUSE()` function (#2197) [Details](5ef63ba34) * Fix memory leaks related to arrays being allocated without being deallocated in start_em and time series calculation subroutines. (#2139) [Details](94aa27a7e) * Fix an access violation error when a PGI compiler is used with urban variables in module_bl_ysu.F when urban option is turned off and the memories of those arrays are not available. (#2137) [Details](33ce70c0f) * Updated grav_settling code to better recognize the land use type so it doesn't crash. Also update the error message if it does crash to go into the rsl.error files rather than rsl.out files. (#2110) [Details](b3eebb3fe) * Bug fix for wrfinput where LCZ urban cells in LU_INDEX were overwritten with default USGS urban category. (#2153) [Details](d96478d4f) * Add manage_externals tool to access physics modules in MMM-physics git repository. (#2126) [Details](7195dc250) * Submodule implementation of the MYNN-EDMF (https://github.com/NCAR/MYNN-EDMF). The module names changed from *_mynn_* to *_mynnedmf_* to resolve a version conflict in MPAS. This version was originally developed within FV3/CCPP for RRFSv1, but has been refactored (to a k-only scheme) resulting in a speed-up of about 10-15% and it has since been tuned to better perform in MPAS and WRF compared to previous versions which were primarily developed for use in FV3. (#2148) [Details](383476531) * When the namelist option write_hist_at_0h_rst is set to .true. under &time_control, history write-out will now be conducted for the first time step for both the 0th stream (wrfout* files) and any special user-defined streams being implemented. (#2133) [Details](61d1c84cb) ## Dynamics * Corrected algorithms in the tipping bucket for precipitation and in the nudging routines to adjust for imprecision in single-precision real numbers exceeding the resolvable values in long (>23-year) continuous simulations. (#2063) [Details](a32188308) ## Data Assimilation * This PR adds an incremental analysis update capability. In the DA code, code is added to write out analysis increments for all variables in WRF netCDF format using auxiliary history output stream #5. In the model, analysis increments are divided by the number of time steps in a specified time window and added to the model similar to physics tendencies. The input stream for the model is 15. The capability is turned on by adding iau = 1 and iau_time_window_sec in &time_control. The way the increments are added to the model is similar to what described by the paper by Chen et al. (https://doi-org.cuucar.idm.oclc.org/10.1175/WAF-D-22-0127.1). (#2151) [Details](6741f010e) ## Chemistry * Bug fix in the calculation of optical properties. Mass redistribution between GOCART dust/sea salt and MOZAIC bins was corrected. It slightly increased (by 3-5%) the aerosol optical depth (AOD). (#2112) [Details](bb791e73d) * Fix a bug where TUV and FTUV fail to initialize the distance to the Sun properly if the simulation starts on 1 Jan. (#2171) [Details](9aa3979f0) ## Hydro * In `hydro.namelist` adding lake_opt to namelist, reservoirs to own namelist. Support for lakes (reservoirs) in non-UDMP reach-based routing added and some style guide cleanup completed. (#2146) [Details](6d1db68f6) * Hydro reservoir drainage area (DA) lake option bugfix (#2182) [Details](313834d41) ## Miscellaneous * Update README.namelist file (#2193) [Details](7053a6ae9) * A namelist option, default_soiltype, is added to define filled-in land category along water/land boundaries where soil data may be missing in program real. (#2166) [Details](2f68d7b70) * Add dzstretch_u and dzbot in namelist.input. Users are advised to check UG for other parameters to use. (#2165) [Details](89ba5181b) * Noah-MP code tag is updated to corresponding to WRFV4.7 release. (#2207) [Details](f11e38164) * Fixed defs for adap time step namelist vars in README.namelist (#2158) [Details](30a16a1ce)

…2231) TYPE: bug fix KEYWORDS: mpi, quilting, comm SOURCE: internal DESCRIPTION OF CHANGES: Problem: PR #2157 added changes to match an appropriate `MPI_Datatype` to a specific `typesize` during `col_on_comm()` and `dst_on_comm()`. This relies on `MPI_Type_match_size()` to query MPI about the equivalent MPI definition for a particular datatype size. There are safety checks to query `MPI_TYPECLASS_INTEGER` if `MPI_TYPECLASS_REAL` fails. However, when given a datatype size that does not match a possible `MPI_TYPECLASS_REAL` value (e.g. 1 byte where no real exists for single byte) instead of getting a failure via return code the query is treated as a critical failure and fully aborts the program. As the query does not rely on critical process handling _and_ since there already exists adequate checks to abort if no sufficient value is found, this preemptive abort is unnecessary. Solution: Temporarily install a pass through errhandler that does not modify the return code but also does not abort. Allow the if statements of finding a correct `MPI_Datatype` to abort if deemed necessary. Additionally, once the checks are complete, reinstate any previous errhandler and free our pass through handle. ISSUE: #2225 TESTS CONDUCTED: 1. Tested the stability of this call to handle correct and incorrect types various times while constantly replacing the error handler. RELEASE NOTE: In collect_on_comm.c, use a temporary pass through errhandler to allow MPI_Type_match_size to fail correctly with error code rather than fully abort the program.

benkirk added 2 commits January 17, 2025 12:43

remove trailing whitespace

13bbd1c

benkirk requested a review from a team as a code owner January 17, 2025 20:52

benkirk mentioned this pull request Jan 17, 2025

dmpar WRF crashes with large model wrfinput when using a serial I/O format #1333

Open

islas changed the base branch from master to develop January 17, 2025 21:20

Only detect overflow in col_on_comm(), report, and exit.

13b94c2

Fixes runtime failures caught by CI in in the underlying MPI_Gatherv().

weiwangncar added the Develop Branch label Jan 18, 2025

Properly define MPI_Datatype dtype.

0d8def7

Of course dtype needs to be MPI_Datatype, not an int. This error sneaked through MPICH-based tests but not OpenMPI. Hopefully this change will address previous CI failures.

islas approved these changes Jan 23, 2025

View reviewed changes

weiwangncar approved these changes Feb 5, 2025

View reviewed changes

islas merged commit af81014 into wrf-model:develop Feb 5, 2025
2 checks passed

mszpindler added a commit to Lumi-supercomputer/LUMI-EasyBuild-contrib that referenced this pull request Mar 6, 2025

Add patch from github.com/wrf-model/WRF/pull/2157

19315a6

mszpindler mentioned this pull request Mar 6, 2025

WRF-SFIRE initial recipe Lumi-supercomputer/LUMI-EasyBuild-contrib#217

Merged

nalssi89 mentioned this pull request May 13, 2025

wrf.exe aborts after upgrading to WRF v4.7.0. #2225

Closed

islas mentioned this pull request May 30, 2025

Use a temp pass through errhandler for trivial errors causing abort #2231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (Fix for #2156) #2157

Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (Fix for #2156) #2157

benkirk commented Jan 17, 2025 •

edited

Loading

Uh oh!

benkirk commented Jan 17, 2025

Uh oh!

dudhia commented Jan 17, 2025 via email

Uh oh!

weiwangncar commented Jan 19, 2025

Uh oh!

weiwangncar commented Jan 19, 2025

Uh oh!

benkirk commented Jan 21, 2025

Uh oh!

dudhia commented Jan 22, 2025

Uh oh!

islas left a comment

Uh oh!

benkirk commented Jan 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (Fix for #2156) #2157

Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow. (Fix for #2156) #2157

Conversation

benkirk commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benkirk commented Jan 17, 2025

Uh oh!

dudhia commented Jan 17, 2025 via email

Uh oh!

weiwangncar commented Jan 19, 2025

Uh oh!

weiwangncar commented Jan 19, 2025

Uh oh!

benkirk commented Jan 21, 2025

Uh oh!

dudhia commented Jan 22, 2025

Uh oh!

islas left a comment

Choose a reason for hiding this comment

Uh oh!

benkirk commented Jan 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

benkirk commented Jan 17, 2025 •

edited

Loading