Skip to content

Conversation

@climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Oct 7, 2025

Description

This PR adds the capability to parse nested suites in capgen, as discussed in #275. The implementation supports two types of nested suites using the XML element nested_suite.

> cat test/nested_suite_test/main_suite.xml
<?xml version="1.0" encoding="UTF-8"?>

<suites version="2.0">

  <suite name="radiation3_suite">
    <group name="rad_lw_group">
      <scheme>rad_lw</scheme>
    </group>
    <group name="rad_sw_group">
      <nested_suite name="radiation3_subsuite" group="rad_sw_group" file="radiation3_subsuite.xml"/>
    </group>
  </suite>

  <suite name="main_suite">
    <group name="radiation1">
      <subcycle loop="num_subcycles_for_effr">
        <scheme>effr_pre</scheme>
        <subcycle loop="2">
          <subcycle loop="2">
            <scheme>effr_calc</scheme>
          </subcycle>
        </subcycle>
        <scheme>effr_post</scheme>
      </subcycle>
      <nested_suite name="radiation2_suite" group="effrs_calc" file="radiation2_suite.xml"/>
    </group>
    <nested_suite name="radiation3_suite"/>
  </suite>

</suites>

Notes/Features

  1. The nested suites can reside in the same file or external xml files, and the suites can be nested recursively.
  2. All external xml files must be valid under the new XML schema 2.0 that was added to support multiple suites within a suite definition file under a root <suites> element.
  3. Old suite definition files using XML schema 1.0 are still supported, but nested suites are only support for schema version 2 and above.
  4. Nested suites inside a group construct must have the group= attribute; the content of that group in the referenced suite is merged into the existing group while preserving the order. This is the first example (aka the merge=true example) described in Add suite keyword to suite definition file? #275 (comment).
  5. Nested suites one level up, i.e. at the group level, must not have the group= attribute; the content of the entire referenced suite (i.e. all groups) is added in the correct location in the main suite. This is the second example (aka the merge=false example) described in Add suite keyword to suite definition file? #275 (comment).
  6. There is no limit on how many nested suites can be referenced in a suite at either of the two levels described in 4 and 5.
  7. Nested suites can be recursive, and external xml files containing nested suites can reference additional suites in the same file or in external files.
  8. Disclaimer I don't like writing docstrings and doctests, therefore I asked chatgpt to do that for me. It did a decent job, certainly better than I would have, but I still had to go in and correct a few things.
  9. The PrettyElementTree class was replaced by a much shorter function write_xml_file that uses features available in the already-used XML python library. And since it contained dom twice in the name, I simply couldn't resist! The output of the new implementation is almost identical to the previous PrettyElementTree output, see screenshot at the bottom of the PR description.
  10. More than 2/3 of the changed/added files in this PR are just for the new test nested_suite_test, thus the actual changes to the code are quite small and hopefully easy to review.

User interface changes?: Yes (sort of), but these are optional. In order to use the new functionality, users have to update their suite definition file to the new XML schema version 2.0 (including all XML files that contain nested suites) and use the above syntax for nested_suite elements. There are no user interface changes for the previous XML schema 1.0, which remains valid, and there are no user interface changes for invoking capgen or in the auto-generated code.

Issues

Fixes #275

Testing

Test removed: none
Test added:
- added test/nested_suite_test - see README.md in that directory for more information
- added doctests for the new functions in xml_tools.py
Unit tests: all pass
System tests: all pass
Manual testing: nested_suite_test

Additional information

Difference in datatable.xml output of the new write_xml_file function (left) and the old PrettyElementTree class (right):

image

In a nutshell, the differences are no white spaces before the closing characters of XML elements, and the additional <? xml ...?> line at the top. Both of these seem reasonable to me.

@climbfuji climbfuji changed the title Work in progress: Nested suites in capgen Add capability to parse nested suites in capgen Oct 7, 2025
@climbfuji climbfuji marked this pull request as ready for review October 7, 2025 20:24
@climbfuji climbfuji moved this to In progress in capgen unification Oct 7, 2025
@climbfuji climbfuji self-assigned this Oct 7, 2025
@climbfuji climbfuji added the capgen-unification Issues/PRs necessary for capgen/prebuild unification label Oct 7, 2025
@climbfuji climbfuji linked an issue Oct 7, 2025 that may be closed by this pull request
Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions

... </suites>
... '''
>>> root = ET.fromstring(xml)
>>> expand_nested_suites(root, logger)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if you've tested the behavior of infinite recursion, is that something we should have a test for to make sure it fails gracefully?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and I recall that we discussed this two weeks ago. If I remember correctly, we assumed that Python would throw an error about the recursion (but we didn't try). We could make a doctest for this, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder as part of this algorithm if it would make sense to keep a list of nested suites that have been processed that contain another nested suite. Couldn't you avoid infinite recursion by stopping if you're currently trying to processes a nested suite that already occurs in the list of already-processed nested suites that contain nested suites?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 0d144ae

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with the current solution, it gives no hints as to where the problem might lie.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use the current solution (since it is the safest and easiest one), but collect the names of the suites and just spit out that list when we throw the exception?

Copy link
Collaborator

@grantfirl grantfirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slick. I tried to review in order to understand how to use the new functionality and the implementation in the python scripts. I didn't really review the added test.

Copy link
Collaborator

@peverwhee peverwhee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of requests. I haven't done a full test review yet but will get to that in my second review!

@gold2718
Copy link
Collaborator

Nested suites one level up, i.e. at the group level, must not have the group= attribute

Why is this requirement needed or desired?

@gold2718
Copy link
Collaborator

Is it okay for a SDF to have more than one suite after expansion of the nested suites?
If this is true, why are suites which happen to be expanded into a different suite in the same SD removed? Would it be illegal somehow to keep both?
It is certainly less fragile to remove them (as opposed to generating a suite cap for a subsuite), however, the current behavior will need to be clearly documented.

@climbfuji
Copy link
Collaborator Author

Is it okay for a SDF to have more than one suite after expansion of the nested suites? If this is true, why are suites which happen to be expanded into a different suite in the same SD removed? Would it be illegal somehow to keep both? It is certainly less fragile to remove them (as opposed to generating a suite cap for a subsuite), however, the current behavior will need to be clearly documented.

Since you are still awake ;-)

The removal of the expanded suite happens entirely in memory and ONLY within the expansion of the top-level suite you are processing. That means if you have two suites A and B that capgen is supposed to process, and both refer to the same nested suite, they will both see the original, unmodified suites as they are on the file system.

The removal was at least for the reason of writing the fully expanded suite to disk and to the datatable (so that it is clear that this is the final suite that is getting processed). I don't recall immediately if there was another reason for it - it's been too long ago that I worked on the code.

@gold2718
Copy link
Collaborator

The removal of the expanded suite happens entirely in memory and ONLY within the expansion of the top-level suite you are processing.

I do not have any issue with this feature, removing a suite which is nested inside another suite in the same file seems correct.

However, that is not my question. As an example, say you have an SDF which has two suites with neither being nested. I think we end up with two suites. Is that correct? Does / should capgen handle that situation as if these suites were in two different SDFs?

@climbfuji
Copy link
Collaborator Author

climbfuji commented Nov 12, 2025

@gold2718 Ignore all of what I wrote at an apparently too early time today. PEBKAC.

I do see this when I process multiple suites in one SDF that remain after expansion:

      if (trim(suite_name) == 'main_suite') then
         call main_suite_timestep_initial(errflg=errflg, errmsg=errmsg)
      else if (trim(suite_name) == 'another_suite') then
         call another_suite_timestep_initial(errflg=errflg, errmsg=errmsg)
      else
         write(errmsg, '(3a)')"No suite named ", trim(suite_name), "found"
         errflg = 1
      end if

But questions remain. Do we want this? Should there be only one remaining suite after expanding suites?

The more I think about it, I get the impression that it would be easier to go back to just one suite per SDF.

@climbfuji
Copy link
Collaborator Author

@gold2718 I implemented the capability to insert a specific group at the suite level, as per #691 (comment) and the discussion we had at the tag up on Monday.

Commit:

commit f336c174b618817bdcaee84abe3db65dd43d96ea (HEAD -> feature/nested_suites)
Author: Dom Heinzeller <[email protected]>
Date:   Wed Nov 12 10:36:19 2025 -0700

    Add capability to insert only a specific group of a nested suite at the suite level

The output for the nested_suite_test is:

> cat test/nested_suite_test/ccpp/main_suite_expanded.xml
<?xml version="1.0" ?>
<suites version="2.0">
  <suite name="main_suite">
    <group name="radiation1">
      <subcycle loop="num_subcycles_for_effr">
        <scheme>effr_pre</scheme>
        <subcycle loop="2">
          <subcycle loop="2">
            <scheme>effr_calc</scheme>
          </subcycle>
        </subcycle>
        <scheme>effr_post</scheme>
      </subcycle>
      <subcycle loop="num_subcycles_for_effr">
        <scheme>effrs_calc</scheme>
      </subcycle>
      <scheme>effr_diag</scheme>
    </group>
    <group name="rad_lw_group">
      <scheme>rad_lw</scheme>
    </group>
    <group name="rad_sw_group">
      <scheme>rad_sw</scheme>
    </group>
  </suite>
</suites>

@gold2718
Copy link
Collaborator

The more I think about it, I get the impression that it would be easier to go back to just one suite per SDF.

I would support this. We could drop the suites tag, add the version back to suite, and just add the nested_suite tag. Thoughts @peverwhee, @grantfirl, @mkavulich, @dustinswales (fingers crossed on that last mention)?

@peverwhee
Copy link
Collaborator

The more I think about it, I get the impression that it would be easier to go back to just one suite per SDF.

I would support this. We could drop the suites tag, add the version back to suite, and just add the nested_suite tag. Thoughts @peverwhee, @grantfirl, @mkavulich, @dustinswales (fingers crossed on that last mention)?

I agree - one suite per SDF sounds simplest.

@climbfuji
Copy link
Collaborator Author

@peverwhee @gold2718 I reverted back to only one suite per SDF. The code is so much cleaner! The only things that got a little messy are the doctests, because I need to read nested suites from files (I could have tried Python mock etc, but this seemed easy enough). I also implemented a - what I believe - robust way of catching recursions.

@mkavulich
Copy link
Collaborator

@climbfuji So if I'm understanding right, the difference between this latest implementation and the original is that we allow nested suites, but only if they are defined in another file?

@climbfuji
Copy link
Collaborator Author

@climbfuji So if I'm understanding right, the difference between this latest implementation and the original is that we allow nested suites, but only if they are defined in another file?

Correct. That eliminates the need for <suites> and make schema v2 much simpler.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (again)

Copy link
Collaborator

@peverwhee peverwhee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @climbfuji !

SIMA will eventually need an offline way to generate the expanded SDF because of the way the build system works, but I'll open an issue and make a PR for that in the future!

Copy link
Collaborator

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not done a complete code review but I wrote some tests instead. I will submit my changes and new test files as a PR to this branch.
I think all the tests should pass, we should discuss any with which you disagree.

</xs:choice>
</xs:sequence>
<xs:attribute name="name" type="xs:ID" use="required"/>
<xs:attribute name="lib" type="xs:string" use="optional"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' think it's used anywhere. This came from the suite_v1.0 schema. I'll comb through the code and if we don't need it, I'll remove it. My guess is that it's from the old days of the "dynamic CCPP".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 4ac80e4

@gold2718
Copy link
Collaborator

Is there a purpose to this branch?

@climbfuji
Copy link
Collaborator Author

Is there a purpose to this branch?

No, I must have accidentally pushed to the wrong remote. I deleted it.

@gold2718
Copy link
Collaborator

SIMA

@peverwhee. if my new tests get accepted, there are examples of generating the expanded suites. Something like:

        _, xml_root = read_xml_file(suite_source, logger)
        expand_nested_suites(xml_root, <suites dir>, logger=logger)
        write_xml_file(xml_root, expanded_suite, logger)

Can the CAM-SIMA build system just wrap this in a function if it needs an expanded suite file? All the routines are in xml_tools.py

@climbfuji
Copy link
Collaborator Author

climbfuji commented Nov 20, 2025

Interesting, after merging @gold2718's PR into my branch (all tests passed for that PR in my fork), I am now seeing this (https://github.com/NCAR/ccpp-framework/actions/runs/19524388803/job/55894181408):

-- ccpp-capgen completed successfully
-- Running ccpp_datafile from /home/runner/work/ccpp-framework/ccpp-framework/test/advection_test
'/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_datafile.py' '--ccpp-files' '/home/runner/work/ccpp-framework/ccpp-framework/build/test/advection_test/ccpp/datatable.xml'
-- CCPP_CAPS = /home/runner/work/ccpp-framework/ccpp-framework/build/test/advection_test/ccpp/ccpp_kinds.F90,/home/runner/work/ccpp-framework/ccpp-framework/src/ccpp_constituent_prop_mod.F90,/home/runner/work/ccpp-framework/ccpp-framework/src/ccpp_scheme_utils.F90,/home/runner/work/ccpp-framework/ccpp-framework/src/ccpp_hashable.F90,/home/runner/work/ccpp-framework/ccpp-framework/src/ccpp_hash_table.F90,/home/runner/work/ccpp-framework/ccpp-framework/build/test/advection_test/ccpp/test_host_ccpp_cap.F90,/home/runner/work/ccpp-framework/ccpp-framework/build/test/advection_test/ccpp/ccpp_cld_suite_cap.F90
-- CCPP cap files retrieved
-- Creating output directory: /home/runner/work/ccpp-framework/ccpp-framework/build/test/capgen_test/ccpp
-- Running ccpp_capgen.py from /home/runner/work/ccpp-framework/ccpp-framework/test/capgen_test
'/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_capgen.py' '--debug' '--host-files' 'test_host_data.meta,test_host_mod.meta,test_host.meta' '--scheme-files' 'setup_coeffs.meta,temp_set.meta,temp_adjust.meta,temp_calc_adjust.meta,make_ddt.meta,environ_conditions.meta' '--suites' 'ddt_suite.xml,temp_suite.xml' '--host-name' 'test_host' '--output-root' '/home/runner/work/ccpp-framework/ccpp-framework/build/test/capgen_test/ccpp'
CMake Error at cmake/ccpp_capgen.cmake:80 (message):
  CCPP cap generation FAILED: result = 1
Call Stack (most recent call first):
  test/capgen_test/CMakeLists.txt:51 (ccpp_capgen)


-- ccpp-capgen stdout: b'/home/runner/work/ccpp-framework/ccpp-framework/test/capgen_test/ddt_suite.xml validates\n'
b'/home/runner/work/ccpp-framework/ccpp-framework/test/capgen_test/temp_suite.xml validates\n'
Traceback (most recent call last):
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/metavar.py", line 1011, in conditional
    int(item)
ValueError: invalid literal for int() with base 10: '('

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_capgen.py", line 752, in <module>
    _main_func()
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_capgen.py", line 745, in _main_func
    _ = capgen(framework_env)
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_capgen.py", line 700, in capgen
    ccpp_api = API(sdfs, host_model, scheme_headers, run_env)
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_suite.py", line 698, in __init__
    suite.analyze(self.host_model, scheme_library,
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/ccpp_suite.py", line 460, in analyze
    item.analyze(phase, self, scheme_library, ddt_library,
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/suite_objects.py", line 2364, in analyze
    lschemes = item.analyze(phase, self, scheme_library, suite_vars, 1)
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/suite_objects.py", line 2130, in analyze
    smods = item.analyze(phase, group, scheme_library,
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/suite_objects.py", line 1292, in analyze
    scheme_mods = self.parent.analyze(phase, group, scheme_library,
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/suite_objects.py", line 2014, in analyze
    smods = item.analyze(phase, group, scheme_library,
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/suite_objects.py", line 1215, in analyze
    self.add_var_debug_check(dict_var)
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/suite_objects.py", line 1324, in add_var_debug_check
    (_, vars_needed) = var.conditional(var_dicts)
  File "/home/runner/work/ccpp-framework/ccpp-framework/scripts/metavar.py", line 1020, in conditional
    raise Exception(f"Cannot find variable '{item}' for generating conditional for '{active}'")
Exception: Cannot find variable '(' for generating conditional for '(index_of_water_vapor_specific_humidity > 0)'

-- Configuring incomplete, errors occurred!

@mwaxmonsky
Copy link
Collaborator

@climbfuji You could modify the ccpp_capgen call in the cmake for the capgen test verbosity level explicity (VERBOSITY ${CCPP_VERBOSITY} -> VERBOSITY 1) and we should be able to see the error message coming from the capgen scripts.

@climbfuji climbfuji force-pushed the feature/nested_suites branch from db06414 to e0d375b Compare November 20, 2025 22:20
@climbfuji
Copy link
Collaborator Author

@climbfuji You could modify the ccpp_capgen call in the cmake for the capgen test verbosity level explicity (VERBOSITY ${CCPP_VERBOSITY} -> VERBOSITY 1) and we should be able to see the error message coming from the capgen scripts.

Thanks. I was able to reproduce this locally, too. But I ended up backing out the PR from @gold2718 entirely, because I am running out of time before the Thanksgiving break and I really want to get this in. Because I force-pushed the older commit, @gold2718 can simply direct his branch/PR to NCAR develop after my PR went in and there won't be any merge conflicts etc. Besides, the changes in @gold2718 didn't really have anything to do with my changes, therefore it's better to keep them separate anyway.

I will address the other questions/issue directly related to my PR before requesting a final review from @gold2718.

@climbfuji climbfuji requested a review from gold2718 November 20, 2025 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

capgen-unification Issues/PRs necessary for capgen/prebuild unification

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

Add suite keyword to suite definition file?

6 participants