Difference in ASE `FrechetCellFilter` vs Torch-Sim (both `md_flavors`) + enhanced convergence testing #200

mstapelberg · 2025-05-17T20:49:53Z

Summary

I updated the 7.6 Script in examples to test both the FrechetCellFilter in torch-sim and ASE. It generates plots for each of the 5 structures in the batch. Through this I have identified that ASE consistently converges in fewer steps for every case, even when I adopt the ase_fire md_flavor.

The script tests 6 structures of (3,2,2) supercells:

Pure Silicon atoms: 96
Pure Copper atoms: 36
Pure Iron atoms: 36
Pure Silicon with 1 vacancy: 95 atoms
Pure Copper with 1 vacancy: 35 atoms
Pure Iron with 1 vacancy: 35 atoms

Here is the comparison of convergence steps needed for each of the 6 structures:

Here are the average displacements for each structure compared to ASE:

Here are the final energies compared to their ASE Basecase:

Seems like there is an issue here?

…tly to ASE

examples/scripts/7_Others/7.6_Compare_ASE_to_VV_FIRE.py

CompRhys · 2025-05-17T21:56:13Z

Thanks for continuing to look into this, I wonder about whether it could be due to numerical instability in some of the matrix exponential/log functions c.f. scipy as these needed to be implemented in torch due to lacking first-party support. The other thing to consider is the frequency of the update of the cell for the calculation of the deformation gradient. For us the original cell is kept throughout, I believe I checked this with @lan496 and that we do match ASE here but that's the main thing I view as a little suspect in the implementation.

CompRhys · 2025-05-17T21:58:25Z

examples/scripts/7_Others/7.6_Compare_ASE_to_VV_FIRE.py

 initial_energies = model(state)["energy"]


 def run_optimization(


Would maybe make two different functions run_optimization_ts and run_optimization_ase as this is already two separate functions with a toggle?

mstapelberg · 2025-05-17T22:11:07Z

Thanks for continuing to look into this, I wonder about whether it could be due to numerical instability in some of the matrix exponential/log functions c.f. scipy as these needed to be implemented in torch due to lacking first-party support. The other thing to consider is the frequency of the update of the cell for the calculation of the deformation gradient. For us the original cell is kept throughout, I believe I checked this with @lan496 and that we do match ASE here but that's the main thing I view as a little suspect in the implementation.

You bet! I'll take a deeper look tomorrow/Monday to see what the differences are in the implementation. Thanks for your comments. Is it alright to continue using this PR to debug/track the issue?

CompRhys · 2025-05-18T23:48:42Z

Is it alright to continue using this PR to debug/track the issue?

Yep! Actually looking at this again for pos only ASE does still take half the steps. Maybe that suggests that we're taking half steps for their full steps in the VV half step?

CompRhys · 2025-05-21T01:36:44Z

torch_sim/optimizers.py

        dr_cell = cell_dt * state.cell_velocities

    # 6. Clamp to max_step
-    # Atoms


In the ASE implementation rather than norm for these comparisons it uses np.vdot https://wiki.fysik.dtu.dk/ase/_modules/ase/optimize/fire.html. np.vdot always appears to return a scalar which means that our use of norm and theirs of vdot might be a problem?

TorchSim#200 (comment)

explores potential fix proposed in #200 (comment) but doesn't seem to help

t-reents · 2025-05-22T19:06:16Z

Dear all,
I was also curious and further looked into it.

Since I don't have permissions to push to this PR, I'll share some results here and point you to my branch.

The first difference is that in the current torch-sim version, the dt is immediately updated/reduced because the velocities are initialised as zero. This explains partially the different number of convergence steps. For my proof of concept, I just initialised them as None and added the if else logic: https://github.com/t-reents/torch-sim/blob/dc4bb0ce71736f9616baad3c56fc20615e70774c/torch_sim/optimizers.py#L1466-L1469 (following ASE https://gitlab.com/ase/ase/-/blob/master/ase/optimize/fire.py?ref_type=heads#L170-175). One could simply move the update logic to the end of the function, to avoid this if-else: https://github.com/t-reents/torch-sim/blob/fix/ase-ts-fire-frechet/torch_sim/optimizers.py#L1419. For simplicity, its called _ase_fire_step_fixed_dt
Next, the vdot was not the issue per se but rather the fact that ASE aggregates velocities and cell_velocities in a single v (https://gitlab.com/ase/ase/-/blob/master/ase/optimize/fire.py?ref_type=heads#L207). Hence, the scaling factors are of course different. This is adjusted in _ase_fire_step_fixed_dt_and_scaling: https://github.com/t-reents/torch-sim/blob/fix/ase-ts-fire-frechet/torch_sim/optimizers.py#L1702C5-L1702C40
Moreover, it seems that the volumes are almost identical, i.e., the cell_velocities and cell_forces should be fine. The main differences are the positions. This has to be further investigated.

Interestingly, the performance on your tests also significantly differs when choosing a 3x2x2 or 2x2x2 supercell for Si, see results below.
Finally, I'm not a 100% sure yet what exactly is happening, but one of the fixes removes the large deviations on bulk Si and the other one on Si with vacuum.

I still have some old labels in the plots below (sorry) and only considered the ase_fire for now (so don't confuse colours with the ones in your results).

torch-sim ASE-FIRE-old corresponds to the current version on main
torch-sim ASE-FIRE-new (Frechet Cell) corresponds to _ase_fire_step_fixed_dt_and_scaling
torch-sim ASE-FIRE (Frechet Cell) corresponds to _ase_fire_step_fixed_dt

🔽 3x2x2 supercell

🔽 2x2x2 supercell

CompRhys · 2025-05-22T19:55:22Z

The first difference is that in the current torch-sim version, the dt is immediately updated/reduced because the velocities are initialised as zero. This explains partially the different number of convergence steps. For my proof of concept, I just initialised them as None and added the if else logic: https://github.com/t-reents/torch-sim/blob/dc4bb0ce71736f9616baad3c56fc20615e70774c/torch_sim/optimizers.py#L1466-L1469 (following ASE https://gitlab.com/ase/ase/-/blob/master/ase/optimize/fire.py?ref_type=heads#L170-175). One could simply move the update logic to the end of the function, to avoid this if-else: https://github.com/t-reents/torch-sim/blob/fix/ase-ts-fire-frechet/torch_sim/optimizers.py#L1419. For simplicity, its called _ase_fire_step_fixed_dt

All these small divergences are definitely worth tracking down! For the position only relaxations we should be able to reach a point where we relax to the same structure in the same number of steps as ASE which is ultimately our end goal. That the decision to use torch_sim should just be driven by whether you're not able to saturate your GPU hardware and therefore want batching or duplication.

Next, the vdot was not the issue per se but rather the fact that ASE aggregates velocities and cell_velocities in a single v (https://gitlab.com/ase/ase/-/blob/master/ase/optimize/fire.py?ref_type=heads#L207). Hence, the scaling factors are of course different. This is adjusted in _ase_fire_step_fixed_dt_and_scaling: https://github.com/t-reents/torch-sim/blob/fix/ase-ts-fire-frechet/torch_sim/optimizers.py#L1702C5-L1702C40

@janosh implemented https://github.com/Radical-AI/torch-sim/tree/batched-vdot after I saw this issue and we didn't see a big difference over there so we likely still have an issue in how we applied this.

Moreover, it seems that the volumes are almost identical, i.e., the cell_velocities and cell_forces should be fine. The main differences are the positions. This has to be further investigated.

This is good to know, I agree that it feels like the bigger issue is in the position updates now as there's no good reason to explain why the torchsim implementation needs more steps beyond there being subtle bugs/differences in our code. 30% more steps is quite significant.

I would suggest we merge @mstapelberg PR here such that we lock in the improvements to these comparison scripts and then you open another PR based on your branch having rebased/reset wrt https://github.com/Radical-AI/torch-sim/tree/batched-vdot and we continue to debug there?

mstapelberg · 2025-05-22T20:00:17Z

This plan sounds good to me! Sorry my bandwidth this week has been low with a looming committee meeting.

Thanks all for your help with this.

coderabbitai · 2025-06-10T18:02:32Z

Warning

Rate limit exceeded

@janosh has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b626d09 and 9fc4172.

📒 Files selected for processing (6)

examples/scripts/7_Others/7.6_Compare_ASE_to_VV_FIRE.py (4 hunks)
examples/tutorials/low_level_tutorial.py (1 hunks)
tests/models/test_graphpes.py (3 hunks)
torch_sim/models/sevennet.py (1 hunks)
torch_sim/optimizers.py (0 hunks)
torch_sim/properties/correlations.py (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

mstapelberg added 2 commits May 17, 2025 16:30

updated ASE_to_VV to test FrechetCellFilter as well and compare direc…

6b297ef

…tly to ASE

updated script to compute all 6 structures

631175f

cla-bot bot added the cla-signed Contributor license agreement signed label May 17, 2025

fixed labels for ASE Native

bf29852

CompRhys reviewed May 17, 2025

View reviewed changes

examples/scripts/7_Others/7.6_Compare_ASE_to_VV_FIRE.py Outdated Show resolved Hide resolved

CompRhys reviewed May 17, 2025

View reviewed changes

clean: remove Atoms comment

4324670

CompRhys reviewed May 21, 2025

View reviewed changes

janosh added the bug Something isn't working label May 21, 2025

janosh added 2 commits May 21, 2025 13:10

split run_optimization into run_optimization_ts and run_optimization_ase

6229317

TorchSim#200 (comment)

swap matplotlib for plotly in 7.6_Compare_ASE_to_VV_FIRE.py

f4c0ba8

janosh added a commit that referenced this pull request May 21, 2025

add torch_sim.math.batched_vdot

aaf566b

explores potential fix proposed in #200 (comment) but doesn't seem to help

lint

9fc4172

CompRhys enabled auto-merge (squash) May 22, 2025 20:26

CompRhys merged commit ef5c912 into TorchSim:main May 22, 2025
95 checks passed

janosh changed the title ~~Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing~~ Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing Jun 10, 2025

janosh added the tests Test all the things label Jun 10, 2025

		initial_energies = model(state)["energy"]


		def run_optimization(

Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing #200

Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing #200

Uh oh!

Conversation

mstapelberg commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

CompRhys commented May 17, 2025

Uh oh!

CompRhys May 17, 2025

Choose a reason for hiding this comment

Uh oh!

mstapelberg commented May 17, 2025

Uh oh!

CompRhys commented May 18, 2025

Uh oh!

CompRhys May 21, 2025

Choose a reason for hiding this comment

Uh oh!

t-reents commented May 22, 2025

Uh oh!

CompRhys commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mstapelberg commented May 22, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Jun 10, 2025

Rate limit exceeded

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Difference in ASE `FrechetCellFilter` vs Torch-Sim (both `md_flavors`) + enhanced convergence testing #200

Difference in ASE `FrechetCellFilter` vs Torch-Sim (both `md_flavors`) + enhanced convergence testing #200

mstapelberg commented May 17, 2025 •

edited

Loading

CompRhys commented May 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)