feat(training): autoencoder 🗜️ #252

icedoom888 · 2025-04-10T13:13:33Z

Description

Introduces Autoencoder training in Anemoi.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Issue Number

Closes #171
Reopens #172

Code Compatibility

I have performed a self-review of my code

Code Performance and Testing

I have added tests that prove my fix is effective or that my feature works
I ran the complete Pytest test suite locally, and they pass
I have tested the changes on a single GPU
I have tested the changes on multiple GPUs / multi-node setups
I have run the Benchmark Profiler against the old version of the code
If the new feature introduces modifications at the config level, I have made sure to update Pydantic Schemas and default configs accordingly

Dependencies

I have ensured that the code is still pip-installable after the changes and runs
I have tested that new dependencies themselves are pip-installable.
I have not introduced new dependencies in the inference portion of the pipeline

Documentation

My code follows the style guidelines of this project
I have updated the documentation and docstrings to reflect the changes
I have added comments to my code, particularly in hard-to-understand areas

Additional Notes

📚 Documentation preview 📚: https://anemoi-training--252.org.readthedocs.build/en/252/

📚 Documentation preview 📚: https://anemoi-graphs--252.org.readthedocs.build/en/252/

📚 Documentation preview 📚: https://anemoi-models--252.org.readthedocs.build/en/252/

…re into feature/autoencoder

models/src/anemoi/models/models/autoencoder.py

mchantry · 2025-10-14T10:39:37Z

@icedoom888 please can you push the branch to this repo too, so we can all run the integration tests. Many thanks.

icedoom888 · 2025-10-14T13:53:51Z

@mchantry integration tests passing here: https://github.com/ecmwf/anemoi-core/actions/runs/18498383834

mchantry · 2025-10-14T15:17:10Z

@mchantry integration tests passing here: https://github.com/ecmwf/anemoi-core/actions/runs/18498383834

Great, thanks so much.

mchantry · 2025-10-14T15:59:02Z

@icedoom888 sorry if you have already discussed this. Have you tried using the current forecasting dataset/datamodule but with rollout=0? I believe this will give you the time slices that you get from the singledataset setup, without needing to create a new class. Not adding a new class will help when implementing multiple-datasets for anemoi.

Rilwan-Adewoyin · 2025-10-15T08:51:40Z

@icedoom888 Can you add a config for the hierarchicalautoencoder, currently I believe there are only exemplar configs for the autoencoder.
I suspect you would only need to add a file here: training/src/anemoi/training/config/hierarchical_autoencoder.yaml

icedoom888 · 2025-10-15T11:37:37Z

@icedoom888 Can you add a config for the hierarchicalautoencoder, currently I believe there are only exemplar configs for the autoencoder. I suspect you would only need to add a file here: training/src/anemoi/training/config/hierarchical_autoencoder.yaml

Done!

Rilwan-Adewoyin · 2025-10-15T12:11:01Z

In this PR there are great plot visualisation changes you've added as mentioned by @mc4117 (https://github.com/ecmwf/anemoi-core/pull/252/files#r2426852227)

I've noted the following two aspects:

Improved visualisation when plotting smaller non-global regions (def lambert_conformal_from_latlon_points)
Ability for Map Plot based callbacks to only plot over a subset of grid points (FocusArea)

It seems like this set of changes may be best placed in a second PR.

This 2nd PR would have review from some of the primary contributors to the existing plotting logic, there may be useful suggestions for improving it or ensuring it extends to more usecases - currently they only apply to the Callbacks that plot maps.

I think outside of this the other plot functions essential for reconstruction plotting can be maintained in this PR

icedoom888 · 2025-10-15T12:16:37Z

@icedoom888 sorry if you have already discussed this. Have you tried using the current forecasting dataset/datamodule but with rollout=0? I believe this will give you the time slices that you get from the singledataset setup, without needing to create a new class. Not adding a new class will help when implementing multiple-datasets for anemoi.

# Fallback if max is None or rollout_cfg is missing
        rollout_value = rollout_start
        if rollout_cfg and rollout_epoch_increment > 0 and rollout_max is not None:
            rollout_value = rollout_max

        else:
            LOGGER.warning(
                "Falling back rollout to: %s",
                rollout_value,
            )

        rollout = max(rollout_value, val_rollout)

This code from /users/apennino/anemoi-core/training/src/anemoi/training/data/datamodule/singledatamodule.py, forces rollout to be max of rollout_value and validation_rollout which by schema has to be greater than 1.

icedoom888 · 2025-10-15T12:36:25Z

@mchantry after changing every single rollout schema to NonNegativeInt to allow for rollout to 0, I now get:

[rank0]: IndexError: Caught IndexError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]:   File "/users/apennino/anaconda3/envs/anemoi/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
[rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
[rank0]:   File "/users/apennino/anaconda3/envs/anemoi/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 33, in fetch
[rank0]:     data.append(next(self.dataset_iter))
[rank0]:   File "/users/apennino/anemoi-core/training/src/anemoi/training/data/dataset/singledataset.py", line 284, in __iter__
[rank0]:     timeincrement = self.relative_date_indices[1] - self.relative_date_indices[0]
[rank0]: IndexError: list index out of range

This happens because the normal dataset and dataloader are expecting a list of date indeces, not just one. Hence proving the need for my implementation.

icedoom888 · 2025-10-15T13:21:26Z

@Rilwan-Adewoyin Thanks for reviewing!
I am using the callbacks to these plots in the default configurations of Autoencoders. Specifically the PlotReconstruction is an important part of the training to visualize the output! How do you suggest we handle this?

…re into feature/autoencoder

mc4117 · 2025-10-20T09:01:20Z

@mchantry after changing every single rollout schema to NonNegativeInt to allow for rollout to 0, I now get:

[rank0]: IndexError: Caught IndexError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]:   File "/users/apennino/anaconda3/envs/anemoi/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
[rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
[rank0]:   File "/users/apennino/anaconda3/envs/anemoi/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 33, in fetch
[rank0]:     data.append(next(self.dataset_iter))
[rank0]:   File "/users/apennino/anemoi-core/training/src/anemoi/training/data/dataset/singledataset.py", line 284, in __iter__
[rank0]:     timeincrement = self.relative_date_indices[1] - self.relative_date_indices[0]
[rank0]: IndexError: list index out of range

This happens because the normal dataset and dataloader are expecting a list of date indeces, not just one. Hence proving the need for my implementation.

I ran some tests this morning and I think all that would be needed is an if statement where if len(relative_date_indices) == 1 then set time_increment=1 and then this should run? Let me know if this works for you too

…re into feature/autoencoder

icedoom888 and others added 30 commits February 5, 2025 10:13

introduced ew tasks

ab1e84d

Autoencoder working, hierarchical not yet

81e3cdd

Debugging sharding for hierarchical

d4ad775

Merge branch 'main' into feature/autoencoder

2e8bb13

Added schema

01e0d30

Added documentation and default configs

1611782

Update schema

58f1bf3

Minor fixes

09c0e0e

Merge branch 'main' into feature/autoencoder

c71f553

Removed print

a036ad1

Merge branch 'feature/autoencoder' of github.com:MeteoSwiss/anemoi-co…

ae37019

…re into feature/autoencoder

Merge branch 'main' into feature/autoencoder

67be4ef

Merged main and fixed conflicts

52c1b2e

Refactor autoencoder

01451f9

Merge branch 'main' into feature/autoencoder

d5d849e

Merge main

42d4de0

Refactor wo fit with new structure of anemoi-training

88e8767

gpc

e6937ca

refactor

6143d89

gpc

4ab4370

Refactor

92fe65c

Merge branch 'main' into feature/autoencoder

eb053f6

Added confg and pathced interpolator

b1d009f

Merge branch 'feature/autoencoder' of github.com:MeteoSwiss/anemoi-co…

a2d5764

…re into feature/autoencoder

bugfix: rstep passed to calculate_val_matrics for AutoEncoder

b3d48df

Update default autoencoder yaml to include spectrum plots

cf0b3fe

fix - rstep idx passed by autoencoder should be 0

f3654f4

re-ordering imports to prevent circular imports

dc50b28

Allow truncation data to be passed as None during init

96cf715

update config to include config_validation

5f54200

mc4117 reviewed Oct 13, 2025

View reviewed changes

models/src/anemoi/models/models/autoencoder.py Outdated Show resolved Hide resolved

Addressing comments and remove redundant code

404edcd

icedoom888 and others added 2 commits October 15, 2025 11:18

Merge branch 'main' into feature/autoencoder

3dd775f

Refactor and extra config for hierarchical autoencoder

05bf7b6

refactor

8f2bce5

icedoom888 added 3 commits October 15, 2025 14:38

Small changes

c5b1d72

Refactor spatial mask and schema

e395751

GPC errors fixed

1bfe544

icedoom888 and others added 5 commits October 15, 2025 15:21

Merge branch 'main' into feature/autoencoder

119716c

Minor fix

872c6e3

Merge branch 'feature/autoencoder' of github.com:MeteoSwiss/anemoi-co…

97724f4

…re into feature/autoencoder

Merge branch 'main' into feature/autoencoder

faaa608

Merge branch 'main' into feature/autoencoder

ed7c78e

icedoom888 added 6 commits October 29, 2025 18:19

Code

92fceae

Added change

64568d5

Fixes

e69cdb5

Removed tiny comp

a1c5597

Merge branch 'feature/autoencoder' of github.com:MeteoSwiss/anemoi-co…

95b91ef

…re into feature/autoencoder

quick fix to shape in plot

f6f6b91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(training): autoencoder 🗜️ #252

feat(training): autoencoder 🗜️ #252

icedoom888 commented Apr 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

mchantry commented Oct 14, 2025

Uh oh!

icedoom888 commented Oct 14, 2025

Uh oh!

mchantry commented Oct 14, 2025

Uh oh!

mchantry commented Oct 14, 2025

Uh oh!

Rilwan-Adewoyin commented Oct 15, 2025

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

Rilwan-Adewoyin commented Oct 15, 2025 •

edited

Loading

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

mc4117 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat(training): autoencoder 🗜️ #252

Are you sure you want to change the base?

feat(training): autoencoder 🗜️ #252

Conversation

icedoom888 commented Apr 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Issue Number

Code Compatibility

Code Performance and Testing

Dependencies

Documentation

Additional Notes

Uh oh!

Uh oh!

mchantry commented Oct 14, 2025

Uh oh!

icedoom888 commented Oct 14, 2025

Uh oh!

mchantry commented Oct 14, 2025

Uh oh!

mchantry commented Oct 14, 2025

Uh oh!

Rilwan-Adewoyin commented Oct 15, 2025

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

Rilwan-Adewoyin commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

icedoom888 commented Oct 15, 2025

Uh oh!

mc4117 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

icedoom888 commented Apr 10, 2025 •

edited by github-actions bot

Loading

Rilwan-Adewoyin commented Oct 15, 2025 •

edited

Loading