feat: multi-scale loss implementations (including multi-scale kcrps) #388

ssmmnn11 · 2025-06-27T15:26:46Z

Implement the multiscale loss, as proposed in
https://arxiv.org/abs/2506.10868

The MultiScaleLossWrapper wraps around already implemented loss capabilities and applies them to user-defined scales. The user defines "truncation matrices" to decrease the resolution at the individual scales. The following is an example loss config:

training_loss:
  _target_: anemoi.training.losses.MultiscaleLossWrapper
  truncation_path: ${hardware.paths.truncation}
  filenames: ${hardware.files.truncation_loss}
  weights:
    - 1.0
    - 1.0
  keep_batch_sharded: ${model.keep_batch_sharded}

  internal_loss:
    _target_: anemoi.training.losses.kcrps.AlmostFairKernelCRPS
    scalers: ['node_weights']
    ignore_nans: False
    no_autocast: True
    alpha: 0.95

Of note, the corresponding filenames always require a None valued added to the truncation matrices.

The multi scale loss will now be treated as the default loss for the ensemble model.

The aggregated loss of the various scale will be tracked in the validation metrics. A future PR will focus on implementing loss tracking for individual variables.

📚 Documentation preview 📚: https://anemoi-training--388.org.readthedocs.build/en/388/

📚 Documentation preview 📚: https://anemoi-graphs--388.org.readthedocs.build/en/388/

📚 Documentation preview 📚: https://anemoi-models--388.org.readthedocs.build/en/388/

…e-loss

JPXKQX

Thanks for the PR, Simon! I have two general questions,

Do you think this should be specific to the ensemble model? I am not sure if a multi-scale MSE would produce good results, but it could still be interesting to see it used as a validation metric.
What are your thoughts on where this functionality should be located (as part of the forecaster or as a new loss function)? are there any performance issues we should be aware of?

ssmmnn11 · 2025-08-12T11:58:47Z

Thanks for the PR, Simon! I have two general questions,

Do you think this should be specific to the ensemble model? I am not sure if a multi-scale MSE would produce good results, but it could still be interesting to see it used as a validation metric.

Probably does not make much sense for the single model. But could be tested, and having it as an option would not hurt.

What are your thoughts on where this functionality should be located (as part of the forecaster or as a new loss function)? are there any performance issues we should be aware of?

It has a performance impact if it is used - this is unavoidable. So we should be able to switch it off. It could be a new loss function. the way it is implemented now means it can be used with any loss function though - therefore I see it more as a wrapper of a loss function, or something like that.

…e-loss

Additional fixes to make multiple scales and metrics working --------- Co-authored-by: theissenhelen <[email protected]>

…e-loss

training/src/anemoi/training/losses/__init__.py

training/src/anemoi/training/losses/multiscale.py

OpheliaMiralles · 2025-11-19T15:17:38Z

What is scale in this context? Filtering on grid points for specific areas and re-weight the total loss? Because there is a FilteringLossWrapper that was used for filtering out some variables and could be used the same for grid points, then wrapped in a CombinedLoss to associate weight to each spatial area? But maybe I didn't get exactly and it is doing more than that

ssmmnn11 · 2025-11-19T15:28:44Z

What is scale in this context? Filtering on grid points for specific areas and re-weight the total loss? Because there is a FilteringLossWrapper that was used for filtering out some variables and could be used the same for grid points, then wrapped in a CombinedLoss to associate weight to each spatial area? But maybe I didn't get exactly and it is doing more than that

this is for scale aware training, we explain it here: https://arxiv.org/abs/2506.10868

OpheliaMiralles · 2025-11-19T16:36:34Z

What is scale in this context? Filtering on grid points for specific areas and re-weight the total loss? Because there is a FilteringLossWrapper that was used for filtering out some variables and could be used the same for grid points, then wrapped in a CombinedLoss to associate weight to each spatial area? But maybe I didn't get exactly and it is doing more than that

this is for scale aware training, we explain it here: https://arxiv.org/abs/2506.10868

I saw the paper, I just wonder how is this different than constraining the loss on the spatial dimension to a subset of gridpoints, do that for a set of truncation matrices and aggregate with weights. If it is the case, it could maybe be done with a CombinedLoss + FilteringLossWrapper, because that is how I worked with it on the variable dimension and it wouldn't change much to have truncation matrices instead of specific variable names. Did anyone try to implement it using existing code?

jakob-schloer

Great work! I've left a couple of minor comments. I also think that some more tests and a small section to the documentation would be great to have for that.

training/src/anemoi/training/config/hardware/files/example.yaml

training/src/anemoi/training/config/training/ensemble.yaml

jakob-schloer · 2025-11-20T16:30:09Z

training/src/anemoi/training/train/tasks/ensforecaster.py

-        loss = torch.zeros(1, dtype=batch.dtype, device=self.device, requires_grad=False)
+        loss = torch.zeros(1, dtype=batch[0].dtype, device=self.device, requires_grad=False)
+
+        if self.loss.name == "MultiscaleLossWrapper":


Do we have this if-block because we later want to track the losses for the different scales? In my opinion, we should avoid adding complexity to the Trainer class for the callbacks. Could we not move this logic to the RolloutEval class in diagnostics.callbacks.evaluation?

JPXKQX · 2025-11-20T16:55:26Z

PR #670 has now been merged into main. This includes the SparseProjector class, which could be useful here. Please let me know if you would like any help with this

ssmmnn11 added 8 commits April 25, 2025 12:44

multi loss implementation

142fa66

config update

4b104b8

fix logging

08eb3f5

example setup in debug_ens

f51937d

Merge branch 'kcrps_mloss' into feat/kcrps-multi-scale-loss

05a9d42

fix for channel sharding

8f1ca0c

Merge branch 'fix-channel-sharding' into feat/kcrps-multi-scale-loss

d61ea5f

multi-scale loss improvements

1497ddc

github-actions bot added training models labels Jun 27, 2025

ssmmnn11 marked this pull request as draft July 4, 2025 14:53

anaprietonem added this to Anemoi-dev Jul 10, 2025

github-project-automation bot moved this to Now In Progress in Anemoi-dev Jul 10, 2025

ssmmnn11 requested a review from JPXKQX July 12, 2025 07:16

ssmmnn11 added 4 commits July 17, 2025 12:57

Merge remote-tracking branch 'origin/main' into feat/kcrps-multi-scal…

fa04281

…e-loss

pydantic and some documentation

765b53e

docu update

9c33ce9

fix

0417e77

ssmmnn11 marked this pull request as ready for review July 17, 2025 14:23

ssmmnn11 changed the title ~~Feat/kcrps multi scale loss~~ feat/kcrps multi scale loss Jul 17, 2025

ssmmnn11 added the documentation Improvements or additions to documentation label Jul 17, 2025

ssmmnn11 requested review from anaprietonem and jakob-schloer July 17, 2025 14:25

Merge branch 'main' into feat/kcrps-multi-scale-loss

652507d

ssmmnn11 mentioned this pull request Aug 5, 2025

feat(models): add configurable residual connections in enc-proc-dec #451

Closed

JPXKQX reviewed Aug 5, 2025

View reviewed changes

mchantry added the ATS Approval Needed Approval needed by ATS label Sep 9, 2025

ssmmnn11 added 2 commits September 26, 2025 13:16

merged main

7bf3e8f

fix for single GPU training

ad42367

theissenhelen added 10 commits October 16, 2025 08:08

fix: skip sharding when running on single gpu

35f04fe

refactor: factor truncation operations out of model

27ff0c5

more refactoring

83e17bb

WIP

e2a31f6

instantiation of multiscale working

4e04123

MultiscaleLoss working

47ccbcc

WIP

6e3725c

use kwargs for multiscale

f1dcdfd

Schema for multiscale loss

e8aa6f5

add multiscale to configs

5634add

theissenhelen changed the title ~~feat/kcrps multi scale loss~~ feat: kcrps multi scale loss Nov 4, 2025

Merge remote-tracking branch 'origin/main' into feat/kcrps-multi-scal…

cf63ffa

…e-loss

mchantry changed the title ~~feat: kcrps multi scale loss~~ feat: multi-scale loss implementations (including multi-scale kcrps) Nov 12, 2025

ssmmnn11 and others added 3 commits November 17, 2025 11:41

fix: mscale loss (#661)

10b92d4

Additional fixes to make multiple scales and metrics working --------- Co-authored-by: theissenhelen <[email protected]>

remove unused entries

e79cf27

Merge remote-tracking branch 'origin/main' into feat/kcrps-multi-scal…

bd2c919

…e-loss

HCookie assigned theissenhelen Nov 17, 2025

theissenhelen added 3 commits November 17, 2025 13:59

fix mloss accum missing

8a907ab

add truncation to integration tests

2c7cbc2

adjust weights

f44037a

anaprietonem reviewed Nov 18, 2025

View reviewed changes

training/src/anemoi/training/losses/__init__.py Show resolved Hide resolved

anaprietonem reviewed Nov 18, 2025

View reviewed changes

training/src/anemoi/training/losses/multiscale.py Show resolved Hide resolved

anaprietonem added the ATS Approved Approved by ATS label Nov 19, 2025

github-actions bot added the bug Something isn't working label Nov 19, 2025

jakob-schloer reviewed Nov 20, 2025

View reviewed changes

mchantry removed the ATS Approval Needed Approval needed by ATS label Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multi-scale loss implementations (including multi-scale kcrps) #388

feat: multi-scale loss implementations (including multi-scale kcrps) #388

Uh oh!

ssmmnn11 commented Jun 27, 2025 •

edited by theissenhelen

Loading

Uh oh!

JPXKQX left a comment

Uh oh!

ssmmnn11 commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

OpheliaMiralles commented Nov 19, 2025

Uh oh!

ssmmnn11 commented Nov 19, 2025

Uh oh!

OpheliaMiralles commented Nov 19, 2025

Uh oh!

jakob-schloer left a comment

Uh oh!

Uh oh!

Uh oh!

jakob-schloer Nov 20, 2025

Uh oh!

JPXKQX commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

feat: multi-scale loss implementations (including multi-scale kcrps) #388

Are you sure you want to change the base?

feat: multi-scale loss implementations (including multi-scale kcrps) #388

Uh oh!

Conversation

ssmmnn11 commented Jun 27, 2025 • edited by theissenhelen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JPXKQX left a comment

Choose a reason for hiding this comment

Uh oh!

ssmmnn11 commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

OpheliaMiralles commented Nov 19, 2025

Uh oh!

ssmmnn11 commented Nov 19, 2025

Uh oh!

OpheliaMiralles commented Nov 19, 2025

Uh oh!

jakob-schloer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jakob-schloer Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

JPXKQX commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ssmmnn11 commented Jun 27, 2025 •

edited by theissenhelen

Loading