Skip to content

Conversation

@anaprietonem
Copy link
Contributor

@anaprietonem anaprietonem commented Jul 23, 2025

Description

Pydantic schemas not resolving correcting and getting into the model's metadata and internal class names. Resolve this by:

  • Moving data-processors schemas to models. Initially left in training since those are defined in the 'data' config. However those are ultimately model layers - note this change is a breaking change
  • Test at the beginning of the training that there is no pydantic schemas in the model, otherwise crash
  • Add test in the data processors to ensure all schemas work ( This is to mitigate the fact that default configs just include test for normalizers, so we do not test others like imputers or post-processors commonly used for Earth System Models)

What problem does this change solve?

Issue - #421

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@anaprietonem anaprietonem added the ATS Approval Needed Approval needed by ATS label Jul 23, 2025
@anaprietonem anaprietonem changed the title wip fix for schemas of data processors fix:! for schemas of data processors Jul 23, 2025
@anaprietonem anaprietonem changed the title fix:! for schemas of data processors fix!: for schemas of data processors Jul 23, 2025
@anaprietonem anaprietonem marked this pull request as ready for review July 24, 2025 06:50
@anaprietonem anaprietonem self-assigned this Jul 24, 2025
@anaprietonem
Copy link
Contributor Author

anaprietonem commented Jul 24, 2025

@sahahner I have extended the tests, could you check it works fine and have a look?
@gmertes I have now added a check in our ModelCheckpoint callback to catch this problem. Tested it in main and caught the problem fine. Should just be done at the beginning of the training.

sahahner
sahahner previously approved these changes Jul 25, 2025
Copy link
Member

@sahahner sahahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for correcting the schemas for the pre/postprocessors. They work fine for my config files, and the tests for the schemas look good to me.

@anaprietonem
Copy link
Contributor Author

@gmertes I have update the check to do just 'anemoi.training' rather the schemas specifically. Updated the branch and tested it and seems fine (ie if I test this with the current main it crashes as expected).

@anaprietonem anaprietonem added the ATS Approved Approved by ATS label Jul 30, 2025
Copy link
Member

@gmertes gmertes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for fixing this bug!

@gmertes gmertes merged commit 539939b into main Jul 30, 2025
19 of 20 checks passed
@gmertes gmertes deleted the fix/schemas_in_metadata branch July 30, 2025 12:13
@github-project-automation github-project-automation bot moved this from Now In Progress to Done in Anemoi-dev Jul 30, 2025
@DeployDuck DeployDuck mentioned this pull request Jul 30, 2025
anaprietonem pushed a commit that referenced this pull request Aug 4, 2025
🤖 Automated Release PR

This PR was created by `release-please` to prepare the next release.
Once merged:

1. A new version tag will be created
2. A GitHub release will be published
3. The changelog will be updated

Changes to be included in the next release:
---


<details><summary>training: 0.6.0</summary>

##
[0.6.0](training-0.5.1...training-0.6.0)
(2025-08-01)


### ⚠ BREAKING CHANGES

* for schemas of data processors
([#433](#433))
* BaseGraphModule and tasks introduced in anemoi-core
([#399](#399))

### Features

* Add metadata back to pl checkpoint.
([#303](#303))
([0193b28](0193b28))
* BaseGraphModule and tasks introduced in anemoi-core
([#399](#399))
([f8ab962](f8ab962))
* **deps:** Use mlflow-skinny instead of mlflow
([#418](#418))
([6a8beb3](6a8beb3))
* Log FTT2 loss + Fourier Correlation loss
([#148](#148))
([345b0ab](345b0ab))
* **model:** Postprocessors for leaky boundings
([#315](#315))
([b54562b](b54562b))
* **models:** Checkpointed Mapper Chunking
([#406](#406))
([8577772](8577772))
* **models:** Mapper edge sharding
([#366](#366))
([326751d](326751d))
* Variable filtering
([#208](#208))
([fba5e47](fba5e47))


### Bug Fixes

* Dropping 3.9 ([#436](#436))
([f6c0214](f6c0214))
* For schemas of data processors
([#433](#433))
([539939b](539939b))
* Mlflow hp params limit
([#424](#424))
([138bc3a](138bc3a))
* Mlflowlogger duplicated key
([#414](#414))
([cb64a1c](cb64a1c))
* **models,traininig:** Hierarchical model + integration test
([#400](#400))
([71dfd89](71dfd89))
* **models:** Add removed sharded_input_key in PR400
([#425](#425))
([089fe6f](089fe6f))
* New checkpoint
([#445](#445))
([a25df93](a25df93))
* Plotting error when precip related params are not diagnostic
([#369](#369))
([010cfa3](010cfa3))
* **training:** Address issues with
[#208](#208)
([#417](#417))
([665f462](665f462))
* **training:** Scaler memory usage
([#391](#391))
([a9d30e1](a9d30e1))
* Update import mflow utils unit tests
([#427](#427))
([70ecdd9](70ecdd9))
* Update level retrieval logic
([#405](#405))
([f393bc3](f393bc3))
* Use transforms: Variable for ExtractVariableGroupAndLevel
([#321](#321))
([7649f4f](7649f4f))
* Warm restart ([#443](#443))
([ff96236](ff96236))


### Documentation

* **graphs:** Documenting some missing features
([#423](#423))
([8addbd8](8addbd8))
</details>

<details><summary>graphs: 0.6.3</summary>

##
[0.6.3](graphs-0.6.2...graphs-0.6.3)
(2025-08-01)


### Features

* **graphs:** Add lat weighted attribute
([#223](#223))
([5dd32ca](5dd32ca))
* **graphs:** Support to export edges to npz
([#395](#395))
([e21738f](e21738f))


### Bug Fixes

* Dropping 3.9 ([#436](#436))
([f6c0214](f6c0214))
* **graphs:** Revert PR
[#379](#379)
([#409](#409))
([d51219f](d51219f))
* **graphs:** Throw error instead of raising warning when graph exists.
([#379](#379))
([6ec6c18](6ec6c18))
* **graphs:** Undo masking when torch-cluster is installed
([#375](#375))
([9f75c06](9f75c06))


### Documentation

* **graphs:** Documenting some missing features
([#423](#423))
([8addbd8](8addbd8))
</details>

<details><summary>models: 0.9.0</summary>

##
[0.9.0](models-0.8.1...models-0.9.0)
(2025-08-01)


### ⚠ BREAKING CHANGES

* for schemas of data processors
([#433](#433))

### Features

* **model:** Postprocessors for leaky boundings
([#315](#315))
([b54562b](b54562b))
* **models:** Checkpointed Mapper Chunking
([#406](#406))
([8577772](8577772))
* **models:** Mapper edge sharding
([#366](#366))
([326751d](326751d))


### Bug Fixes

* Dropping 3.9 ([#436](#436))
([f6c0214](f6c0214))
* For schemas of data processors
([#433](#433))
([539939b](539939b))
* **models,traininig:** Hierarchical model + integration test
([#400](#400))
([71dfd89](71dfd89))
* **models:** Remove repeated lines
([#377](#377))
([1f0b861](1f0b861))
* **models:** Uneven channel sharding
([#385](#385))
([dd095c4](dd095c4))
* Pydantic model validator not working in transformer schema
([#422](#422))
([42f437a](42f437a))
* Remove dead code and fix typo
([#357](#357))
([8c615ba](8c615ba))
</details>

---
> [!IMPORTANT]
> Please do not change the PR title, manifest file, or any other
automatically generated content in this PR unless you understand the
implications. Changes here can break the release process.
> 
> ⚠️ Merging this PR will:
> - Create a new release
> - Trigger deployment pipelines
> - Update package versions

 **Before merging:**
 - Ensure all tests pass
 - Review the changelog carefully
 - Get required approvals

[Release-please
documentation](https://github.com/googleapis/release-please)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ATS Approval Needed Approval needed by ATS ATS Approved Approved by ATS models training

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Reference to anemoi.training.schemas in inference checkpoint pickle

4 participants