Skip to content

Conversation

@benkirk
Copy link
Contributor

@benkirk benkirk commented Jan 17, 2025

Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow.

TYPE: bug fix

KEYWORDS: prevent displacements overflow in MPI_Gatherv() and MPI_Scatterv() operations

SOURCE: Benjamin Kirk & Negin Sobhani (NSF NCAR / CISL)

DESCRIPTION OF CHANGES:
Problem:
The MPI_Gatherv() and MPI_Scatterv() operations require integer displacements into the communications buffers. Historically everything is passed as an MPI_CHAR, causing these displacements to be larger than otherwise necessary. For large domain sizes this can cause the displace[] offsets to exceed the maximum int, wrapping to negative values.

Solution:
This change introduces additional error checking and then uses the function MPI_Type_match_size() (available since MPI-2.0) to determine a suitable MPI_Datatype given the input *typesize. The result then is that the displace[] offsets are in terms of data type extents, rather than bytes, and less likely to overflow.

ISSUE: Fixes #2156

LIST OF MODIFIED FILES:
M frame/collect_on_comm.c

TESTS CONDUCTED:
Failed cases run now.

RELEASE NOTE:
Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow.

…displacements overflow.

The MPI_Gatherv() and MPI_Scatterv() operations require integer displacements into the communications buffers.
Historically everything is passed as an MPI_CHAR, causing these displacements to be larger than otherwise necessary.
For large domain sizes this can cause the displace[] offsets to exceed the maximum int, wrapping to negative values.

This change introduces additional error checking and then uses the function MPI_Type_match_size() (available since MPI-2.0)
to determine a suitable MPI_Datatype given the input *typesize.  The result then is that the displace[] offsets are in
terms of data type extents, rather than bytes, and less likely to overflow.
@benkirk benkirk requested a review from a team as a code owner January 17, 2025 20:52
@islas islas changed the base branch from master to develop January 17, 2025 21:20
@benkirk
Copy link
Contributor Author

benkirk commented Jan 17, 2025

Just for awareness, I can't see the output of the failed WRF-BUILD-2690; I get a timeout accessing
https://ncar_jenkins.scalacomputing.com/job/WRF-Feature-Regression-Test/2690/console

@dudhia
Copy link
Collaborator

dudhia commented Jan 17, 2025 via email

Fixes runtime failures caught by CI in in the underlying MPI_Gatherv().
Of course dtype needs to be MPI_Datatype, not an int.  This error sneaked through MPICH-based tests but not OpenMPI.
Hopefully this change will address previous CI failures.
@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

@weiwangncar
Copy link
Collaborator

@benkirk Thanks for the fix! I tested it for a few cases we could run before and it is working now.

@benkirk
Copy link
Contributor Author

benkirk commented Jan 21, 2025

Thanks for the success report @weiwangncar, happy to help!

@dudhia
Copy link
Collaborator

dudhia commented Jan 22, 2025

I will tag @mgduda and @islas to review this.

Copy link
Collaborator

@islas islas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benkirk Thanks for the fix! This has been a heck of a problem to hunt down.

If my understanding is correct the overflow can still manifest, just with a domain approximately 4x in size than what triggered it before, right? Though the error will now be caught and aborted at the appropriate location.

@benkirk
Copy link
Contributor Author

benkirk commented Jan 23, 2025

That's right, the displace[] buffer is now in terms of array elements, not bytes. So for REAL*4 or integer arrays the overflows can occur still occur, but at ~4x larger problem sizes than before. And in these cases the overflow error will be caught in the routine so at least the source of the issue is identified.

(There's some potential fallbacks even in that case I can think of but haven't implemented. Those fallbacks would require some conditions on the buffers to be sent, like evenly divisible by some integer factor. Such tricks are necessary when trying to send buffers >2GB with MPI.)

@islas islas merged commit af81014 into wrf-model:develop Feb 5, 2025
2 checks passed
mszpindler added a commit to Lumi-supercomputer/LUMI-EasyBuild-contrib that referenced this pull request Mar 6, 2025
islas added a commit that referenced this pull request Apr 25, 2025
# WRF Version v4.7.0
The WRF model has been update to Version v4.7.0 on April 25, 2025

__Acknowledgements__: We would like to thank
*  Adam Dury (WeatherQuest)
*  Andrea Zonato, Royal Netherlands Meteorological Institute (KNMI)
*  Benjamin Kirk & Negin Sobhani (NSF NCAR / CISL)
*  Cenlin He @cenlinhe and Tzu-Shun Lin (NCAR)
*  Charlie Li, software developer from lakes environmental, Canada
*  Jakub Lewandowski (University of Leeds)
*  James Ruppert (University of Oklahoma)
*  Joseph Olson (NOAA/GSL)
*  Alexander Ukhov (KAUST)
*  L. Fita (UBA/CIMA/IFAECI)
*  Lukas Pilz (Heidelberg University)
*  Martilli, Alberto (CIEMAT)
*  Mathieu Landreau (Centrale Nantes)
*  Robert Conrick (U. of Washington); [email protected]
*  Robert Gilliam & Jon Pleim, US EPA
*  Sergey Osipov (KAUST)
*  Tanya Spero (U.S. EPA)
*  Ted Mansell (NOAA/NSSL) (@MicroTed )

for their contributions to this release.

## Physics

* Fix an error associated with using LCZ categories in NoahMP. Prior to this fix, the LCZs were not correctly referenced, hence ignored in the NoahMP code.  (#2202) [Details](a176a5965)
* NSSL-mp bug fix for (obsolete) droplet nucleation (#2195)  [Details](30c03dc40)
* NSSL microphysics scheme updates include 1. An explicit rain breakup for 3-moment rain (addresses issue of cold pools being too warm and drops being too large in rain cores), 2. Improved reflectivity conservation for graupel->hail conversion and drop freezing, 3. More accurate saturation mixing ratio calculation, 4. New default droplet nucleation that controls excess supersaturation much better than previously (and default is to always predict the number of activated CCN). The update has been submitted to CCPP repository as well.  (#2170) [Details](9d763af90)
* An new microphysics, UFS Double Moment (UDM), 7-class microphysics from Songyou Hong is added (mp_physics=27). UDM mp largely adopts microphysical processes in WDM7, but with bug fixes or revisions based on literature and accumulated realism. UDM mp utilizes the in-cloud microphysics concept (Kim and Hong 2018), with the addition of water-friendly aerosols for CCN initialization. Semi-lagrangian sedimentation of Juang and Hong (2010) is also re-configured for computational efficiency and numerical accuracy. All production terms are optimized by introducing a cloud-top definition for hydrometeors.  (#2147) [Details](5fc76c540)
* Release of the RCON Microphysics package into WRF, which improves upon the warm rain representation of the Thompson-Eidhammer scheme.  RCON is based heavily on the Thompson-Eidhammer scheme with a couple significant changes that improve upon the code in module_mp_rcon.F to generate more realistic rainfall during warm rain events with additional benefits for cold rain, especially warm processes during cold rain events.  Among the most significant changes for rain productions are 1) the use of a wider cloud water DSD of lognormal shape instead of the gamma DSD used by the Thompson-Eidhammer parameterization and 2) enhancement of the cloud-to-rain autoconversion parameterization to accommodate the new shape. The changes here also allow for sedimentation of cloud water within the lowest model layer, which effectively creates a drizzle mode in the scheme.  Accompanying published reference: Conrick, R., C. F. Mass, and L. McMurdie, 2023: Improving Simulations of Warm Rain in a Bulk Microphysics Scheme. Mon. Wea. Rev., 152, 169-185, https://doi.org/10.1175/MWR-D-23-0035.1.  (#2144) [Details](de213c920)
* Fix an erroneous print for using ghg_input when no radiation option is selected, mostly from idealized cases.  (#2199) [Details](bd4ecbe01)
* Fix a loop index error in bep_bem urban code.  (#2196) [Details](0171299d3)
* Noah-MP bug fixes for (1) allowing BATS snow albedo scheme for nighttime snow aging, (2) the potential leakage caused by calculate_soil variable during parallelized run, (3) the missing of HCPCT output for glacier points.  (#2160) [Details](fd079bf48)
* The similarity stability functions phim and phieps, necessary for calculating the surface values of tke and dissipation rate in the tke-epsilon-tpe PBL scheme [Zonato et al., 2022](https://doi.org/10.1175/MWR-D-21-0299.1) have been updated considering the correction term accounting for the roughness length z0. No relevant differences are found in temperature, wind speed, and humidity. Regarding turbulence variables, the stable case has just negligible differences, while the unstable case shows higher values of TKE and dissipation rate and lower values of temperature variance.  (#2120) [Details](70855a73e)
* Pleim-Xiu LSM is now compatible with 61 category MODIS LCZ landuse dataset. A mode of latent heat effects on Tg from vegetated parts and from wet leaves is added to Pleim-Xiu LSM.  (#2023) [Details](b7f31dcde)


## Software

* Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow.  (#2157) [Details](af8101493)
* CMake README documentation on <PackageName>_ROOT variables (#2190)  [Details](33036d613)
* CMake README documentation typo fixes (#2189)  [Details](3fd1aefda)
* Fix aarch64 GCC build when DM configuration selected (#2192)  [Details](8e1d6742c)
* Fixed failed compilation with Intel oneAPI by reworking the dependency linking of hydro CMake compilation  (#2178) [Details](2e0694f14)
* Fix compilation of grib2 IO in make build (#2191)  [Details](2639dcd3f)
* Fix uncontrollable building of external/io_netcdfpar folder for all stanzas  (#2181) [Details](127a8f40a)
* Suppress MYNN-EDMF verify checkout command (#2188)  [Details](3f2465b41)
* Fix typo in confcheck CMakeLists.txt for FSEEKO (#2179) [Details](2572bc5f5)
* Add quotes to optimization flags exceptions in CMake (#2180) [Details](b15e341e4)
* CMake Chem and Chem+KPP Build  (#2018) [Details](b26e64595)
* Consistent double precision definitions (#2099)  [Details](704259871)
* CMake Fix split command flags to be correctly populated (#2108)  [Details](5b09725f5)
* CMake WRFPLUS (#2089)  [Details](695f455e8)
* Override CMake-injected optimization flags in favor of the flags set by the build system and provided stanza information.  (#2138) [Details](b6542b0f7)
* Fixed CMake dev warning `project() should be called prior to this enable_language() call` appearing when using `configure_new` script with some newer versions of Cmake.  (#2125) [Details](0ccba14eb)
* Add documentation to custom properties in CMake to fix compatibility with older versions.  (#2131) [Details](f204246a0)
* Remove leading -D on defines during stanza reading to allow older versions of CMake to configure properly.  (#2130) [Details](c2e121f56)
* Bug fix in CMake FindnetCDF.cmake for empty --has-* nc-config fields  (#2135) [Details](f096921b2)
* CMake confcheck switch to try_* functions (#2090)  [Details](5dd2c192d)
* Bug fix in landread.c to address undefined behavior by adding an explicit return statement in `GET_LANDUSE()` function  (#2197) [Details](5ef63ba34)
* Fix memory leaks related to arrays being allocated without being deallocated in start_em and time series calculation subroutines.  (#2139) [Details](94aa27a7e)
* Fix an access violation error when a PGI compiler is used with urban variables in module_bl_ysu.F when urban option is turned off and the memories of those arrays are not available.  (#2137) [Details](33ce70c0f)
* Updated grav_settling code to better recognize the land use type so it doesn't crash. Also update the error message if it does crash to go into the rsl.error files rather than rsl.out files.  (#2110) [Details](b3eebb3fe)
* Bug fix for wrfinput where LCZ urban cells in LU_INDEX were overwritten with default USGS urban category.  (#2153) [Details](d96478d4f)
* Add manage_externals tool to access physics modules in MMM-physics git repository.  (#2126) [Details](7195dc250)
* Submodule implementation of the MYNN-EDMF (https://github.com/NCAR/MYNN-EDMF). The module names changed from *_mynn_* to *_mynnedmf_* to resolve a version conflict in MPAS. This version was originally developed within FV3/CCPP for RRFSv1, but has been refactored (to a k-only scheme) resulting in a speed-up of about 10-15% and it has since been tuned to better perform in MPAS and WRF compared to previous versions which were primarily developed for use in FV3.  (#2148) [Details](383476531)
* When the namelist option write_hist_at_0h_rst is set to .true. under &time_control, history write-out will now be conducted for the first time step for both the 0th stream (wrfout* files) and any special user-defined streams being implemented.  (#2133) [Details](61d1c84cb)

## Dynamics

* Corrected algorithms in the tipping bucket for precipitation and in the nudging routines to adjust for imprecision in single-precision real numbers exceeding the resolvable values in long (>23-year) continuous simulations.  (#2063) [Details](a32188308)
## Data Assimilation

* This PR adds an incremental analysis update capability. In the DA code, code is added to write out analysis increments for all variables in WRF netCDF format using auxiliary history output stream #5. In the model, analysis increments are divided by the number of time steps in a specified time window and added to the model similar to physics tendencies. The input stream for the model is 15. The capability is turned on by adding iau = 1 and iau_time_window_sec in &time_control. The way the increments are added to the model is similar to what described by the paper by Chen et al. (https://doi-org.cuucar.idm.oclc.org/10.1175/WAF-D-22-0127.1).  (#2151) [Details](6741f010e)
## Chemistry

* Bug fix in the calculation of optical properties. Mass redistribution between GOCART dust/sea salt and MOZAIC bins was corrected. It slightly increased (by 3-5%) the aerosol optical depth (AOD).  (#2112) [Details](bb791e73d)
* Fix a bug where TUV and FTUV fail to initialize the distance to the Sun properly if the simulation starts on 1 Jan.  (#2171) [Details](9aa3979f0)

## Hydro

* In `hydro.namelist` adding lake_opt to namelist, reservoirs to own namelist. Support for lakes (reservoirs) in non-UDMP reach-based routing added and some style guide cleanup completed.  (#2146) [Details](6d1db68f6)
* Hydro reservoir drainage area (DA) lake option bugfix  (#2182) [Details](313834d41)

## Miscellaneous

* Update README.namelist file (#2193)  [Details](7053a6ae9)
* A namelist option, default_soiltype, is added to define filled-in land category along water/land boundaries where soil data may be missing in program real.  (#2166) [Details](2f68d7b70)
* Add dzstretch_u and dzbot in namelist.input. Users are advised to check UG for other parameters to use.  (#2165) [Details](89ba5181b)
* Noah-MP code tag is updated to corresponding to WRFV4.7 release.  (#2207) [Details](f11e38164)
* Fixed defs for adap time step namelist vars in README.namelist (#2158)  [Details](30a16a1ce)
islas added a commit that referenced this pull request Apr 25, 2025
# WRF Version v4.7.0
The WRF model has been update to Version v4.7.0 on April 25, 2025

__Acknowledgements__: We would like to thank
*  Adam Dury (WeatherQuest)
*  Andrea Zonato, Royal Netherlands Meteorological Institute (KNMI)
*  Benjamin Kirk & Negin Sobhani (NSF NCAR / CISL)
*  Cenlin He @cenlinhe and Tzu-Shun Lin (NCAR)
*  Charlie Li, software developer from lakes environmental, Canada
*  Jakub Lewandowski (University of Leeds)
*  James Ruppert (University of Oklahoma)
*  Joseph Olson (NOAA/GSL)
*  Alexander Ukhov (KAUST)
*  L. Fita (UBA/CIMA/IFAECI)
*  Lukas Pilz (Heidelberg University)
*  Martilli, Alberto (CIEMAT)
*  Mathieu Landreau (Centrale Nantes)
*  Robert Conrick (U. of Washington); [email protected]
*  Robert Gilliam & Jon Pleim, US EPA
*  Sergey Osipov (KAUST)
*  Tanya Spero (U.S. EPA)
*  Ted Mansell (NOAA/NSSL) (@MicroTed )

for their contributions to this release.

## Physics

* Fix an error associated with using LCZ categories in NoahMP. Prior to this fix, the LCZs were not correctly referenced, hence ignored in the NoahMP code.  (#2202) [Details](a176a5965)
* NSSL-mp bug fix for (obsolete) droplet nucleation (#2195)  [Details](30c03dc40)
* NSSL microphysics scheme updates include 1. An explicit rain breakup for 3-moment rain (addresses issue of cold pools being too warm and drops being too large in rain cores), 2. Improved reflectivity conservation for graupel->hail conversion and drop freezing, 3. More accurate saturation mixing ratio calculation, 4. New default droplet nucleation that controls excess supersaturation much better than previously (and default is to always predict the number of activated CCN). The update has been submitted to CCPP repository as well.  (#2170) [Details](9d763af90)
* An new microphysics, UFS Double Moment (UDM), 7-class microphysics from Songyou Hong is added (mp_physics=27). UDM mp largely adopts microphysical processes in WDM7, but with bug fixes or revisions based on literature and accumulated realism. UDM mp utilizes the in-cloud microphysics concept (Kim and Hong 2018), with the addition of water-friendly aerosols for CCN initialization. Semi-lagrangian sedimentation of Juang and Hong (2010) is also re-configured for computational efficiency and numerical accuracy. All production terms are optimized by introducing a cloud-top definition for hydrometeors.  (#2147) [Details](5fc76c540)
* Release of the RCON Microphysics package into WRF, which improves upon the warm rain representation of the Thompson-Eidhammer scheme.  RCON is based heavily on the Thompson-Eidhammer scheme with a couple significant changes that improve upon the code in module_mp_rcon.F to generate more realistic rainfall during warm rain events with additional benefits for cold rain, especially warm processes during cold rain events.  Among the most significant changes for rain productions are 1) the use of a wider cloud water DSD of lognormal shape instead of the gamma DSD used by the Thompson-Eidhammer parameterization and 2) enhancement of the cloud-to-rain autoconversion parameterization to accommodate the new shape. The changes here also allow for sedimentation of cloud water within the lowest model layer, which effectively creates a drizzle mode in the scheme.  Accompanying published reference: Conrick, R., C. F. Mass, and L. McMurdie, 2023: Improving Simulations of Warm Rain in a Bulk Microphysics Scheme. Mon. Wea. Rev., 152, 169-185, https://doi.org/10.1175/MWR-D-23-0035.1.  (#2144) [Details](de213c920)
* Fix an erroneous print for using ghg_input when no radiation option is selected, mostly from idealized cases.  (#2199) [Details](bd4ecbe01)
* Fix a loop index error in bep_bem urban code.  (#2196) [Details](0171299d3)
* Noah-MP bug fixes for (1) allowing BATS snow albedo scheme for nighttime snow aging, (2) the potential leakage caused by calculate_soil variable during parallelized run, (3) the missing of HCPCT output for glacier points.  (#2160) [Details](fd079bf48)
* The similarity stability functions phim and phieps, necessary for calculating the surface values of tke and dissipation rate in the tke-epsilon-tpe PBL scheme [Zonato et al., 2022](https://doi.org/10.1175/MWR-D-21-0299.1) have been updated considering the correction term accounting for the roughness length z0. No relevant differences are found in temperature, wind speed, and humidity. Regarding turbulence variables, the stable case has just negligible differences, while the unstable case shows higher values of TKE and dissipation rate and lower values of temperature variance.  (#2120) [Details](70855a73e)
* Pleim-Xiu LSM is now compatible with 61 category MODIS LCZ landuse dataset. A mode of latent heat effects on Tg from vegetated parts and from wet leaves is added to Pleim-Xiu LSM.  (#2023) [Details](b7f31dcde)


## Software

* Determine MPI Data Types in col_on_comm() & dst_on_comm() to prevent displacements overflow.  (#2157) [Details](af8101493)
* CMake README documentation on <PackageName>_ROOT variables (#2190)  [Details](33036d613)
* CMake README documentation typo fixes (#2189)  [Details](3fd1aefda)
* Fix aarch64 GCC build when DM configuration selected (#2192)  [Details](8e1d6742c)
* Fixed failed compilation with Intel oneAPI by reworking the dependency linking of hydro CMake compilation  (#2178) [Details](2e0694f14)
* Fix compilation of grib2 IO in make build (#2191)  [Details](2639dcd3f)
* Fix uncontrollable building of external/io_netcdfpar folder for all stanzas  (#2181) [Details](127a8f40a)
* Suppress MYNN-EDMF verify checkout command (#2188)  [Details](3f2465b41)
* Fix typo in confcheck CMakeLists.txt for FSEEKO (#2179) [Details](2572bc5f5)
* Add quotes to optimization flags exceptions in CMake (#2180) [Details](b15e341e4)
* CMake Chem and Chem+KPP Build  (#2018) [Details](b26e64595)
* Consistent double precision definitions (#2099)  [Details](704259871)
* CMake Fix split command flags to be correctly populated (#2108)  [Details](5b09725f5)
* CMake WRFPLUS (#2089)  [Details](695f455e8)
* Override CMake-injected optimization flags in favor of the flags set by the build system and provided stanza information.  (#2138) [Details](b6542b0f7)
* Fixed CMake dev warning `project() should be called prior to this enable_language() call` appearing when using `configure_new` script with some newer versions of Cmake.  (#2125) [Details](0ccba14eb)
* Add documentation to custom properties in CMake to fix compatibility with older versions.  (#2131) [Details](f204246a0)
* Remove leading -D on defines during stanza reading to allow older versions of CMake to configure properly.  (#2130) [Details](c2e121f56)
* Bug fix in CMake FindnetCDF.cmake for empty --has-* nc-config fields  (#2135) [Details](f096921b2)
* CMake confcheck switch to try_* functions (#2090)  [Details](5dd2c192d)
* Bug fix in landread.c to address undefined behavior by adding an explicit return statement in `GET_LANDUSE()` function  (#2197) [Details](5ef63ba34)
* Fix memory leaks related to arrays being allocated without being deallocated in start_em and time series calculation subroutines.  (#2139) [Details](94aa27a7e)
* Fix an access violation error when a PGI compiler is used with urban variables in module_bl_ysu.F when urban option is turned off and the memories of those arrays are not available.  (#2137) [Details](33ce70c0f)
* Updated grav_settling code to better recognize the land use type so it doesn't crash. Also update the error message if it does crash to go into the rsl.error files rather than rsl.out files.  (#2110) [Details](b3eebb3fe)
* Bug fix for wrfinput where LCZ urban cells in LU_INDEX were overwritten with default USGS urban category.  (#2153) [Details](d96478d4f)
* Add manage_externals tool to access physics modules in MMM-physics git repository.  (#2126) [Details](7195dc250)
* Submodule implementation of the MYNN-EDMF (https://github.com/NCAR/MYNN-EDMF). The module names changed from *_mynn_* to *_mynnedmf_* to resolve a version conflict in MPAS. This version was originally developed within FV3/CCPP for RRFSv1, but has been refactored (to a k-only scheme) resulting in a speed-up of about 10-15% and it has since been tuned to better perform in MPAS and WRF compared to previous versions which were primarily developed for use in FV3.  (#2148) [Details](383476531)
* When the namelist option write_hist_at_0h_rst is set to .true. under &time_control, history write-out will now be conducted for the first time step for both the 0th stream (wrfout* files) and any special user-defined streams being implemented.  (#2133) [Details](61d1c84cb)

## Dynamics

* Corrected algorithms in the tipping bucket for precipitation and in the nudging routines to adjust for imprecision in single-precision real numbers exceeding the resolvable values in long (>23-year) continuous simulations.  (#2063) [Details](a32188308)
## Data Assimilation

* This PR adds an incremental analysis update capability. In the DA code, code is added to write out analysis increments for all variables in WRF netCDF format using auxiliary history output stream #5. In the model, analysis increments are divided by the number of time steps in a specified time window and added to the model similar to physics tendencies. The input stream for the model is 15. The capability is turned on by adding iau = 1 and iau_time_window_sec in &time_control. The way the increments are added to the model is similar to what described by the paper by Chen et al. (https://doi-org.cuucar.idm.oclc.org/10.1175/WAF-D-22-0127.1).  (#2151) [Details](6741f010e)
## Chemistry

* Bug fix in the calculation of optical properties. Mass redistribution between GOCART dust/sea salt and MOZAIC bins was corrected. It slightly increased (by 3-5%) the aerosol optical depth (AOD).  (#2112) [Details](bb791e73d)
* Fix a bug where TUV and FTUV fail to initialize the distance to the Sun properly if the simulation starts on 1 Jan.  (#2171) [Details](9aa3979f0)

## Hydro

* In `hydro.namelist` adding lake_opt to namelist, reservoirs to own namelist. Support for lakes (reservoirs) in non-UDMP reach-based routing added and some style guide cleanup completed.  (#2146) [Details](6d1db68f6)
* Hydro reservoir drainage area (DA) lake option bugfix  (#2182) [Details](313834d41)

## Miscellaneous

* Update README.namelist file (#2193)  [Details](7053a6ae9)
* A namelist option, default_soiltype, is added to define filled-in land category along water/land boundaries where soil data may be missing in program real.  (#2166) [Details](2f68d7b70)
* Add dzstretch_u and dzbot in namelist.input. Users are advised to check UG for other parameters to use.  (#2165) [Details](89ba5181b)
* Noah-MP code tag is updated to corresponding to WRFV4.7 release.  (#2207) [Details](f11e38164)
* Fixed defs for adap time step namelist vars in README.namelist (#2158)  [Details](30a16a1ce)
islas added a commit that referenced this pull request May 31, 2025
…2231)

TYPE: bug fix

KEYWORDS: mpi, quilting, comm

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
PR #2157 added changes to match an appropriate `MPI_Datatype` to a
specific `typesize` during `col_on_comm()` and `dst_on_comm()`. This
relies on `MPI_Type_match_size()` to query MPI about the equivalent MPI
definition for a particular datatype size. There are safety checks to
query `MPI_TYPECLASS_INTEGER` if `MPI_TYPECLASS_REAL` fails.

However, when given a datatype size that does not match a possible
`MPI_TYPECLASS_REAL` value (e.g. 1 byte where no real exists for single
byte) instead of getting a failure via return code the query is treated
as a critical failure and fully aborts the program. As the query does
not rely on critical process handling _and_ since there already exists
adequate checks to abort if no sufficient value is found, this
preemptive abort is unnecessary.

Solution:
Temporarily install a pass through errhandler that does not modify the
return code but also does not abort. Allow the if statements of finding
a correct `MPI_Datatype` to abort if deemed necessary. Additionally,
once the checks are complete, reinstate any previous errhandler and free
our pass through handle.

ISSUE: 
#2225 

TESTS CONDUCTED: 
1. Tested the stability of this call to handle correct and incorrect
types various times while constantly replacing the error handler.

RELEASE NOTE: 
In collect_on_comm.c, use a temporary pass through errhandler to allow
MPI_Type_match_size to fail correctly with error code rather than fully
abort the program.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MPI_Gatherv/MPI_Scatterv displacements overflow in frame/collect_on_comm.c

4 participants