Skip to content

Conversation

@mstapelberg
Copy link
Contributor

@mstapelberg mstapelberg commented May 17, 2025

Summary

I updated the 7.6 Script in examples to test both the FrechetCellFilter in torch-sim and ASE. It generates plots for each of the 5 structures in the batch. Through this I have identified that ASE consistently converges in fewer steps for every case, even when I adopt the ase_fire md_flavor.

The script tests 6 structures of (3,2,2) supercells:

  1. Pure Silicon atoms: 96
  2. Pure Copper atoms: 36
  3. Pure Iron atoms: 36
  4. Pure Silicon with 1 vacancy: 95 atoms
  5. Pure Copper with 1 vacancy: 35 atoms
  6. Pure Iron with 1 vacancy: 35 atoms

Here is the comparison of convergence steps needed for each of the 6 structures:
Convergence_Steps_Per_Structure

Here are the average displacements for each structure compared to ASE:
Mean_Displacement_Comp

Here are the final energies compared to their ASE Basecase:
Compare_Final_Energy

Seems like there is an issue here?

@cla-bot cla-bot bot added the cla-signed Contributor license agreement signed label May 17, 2025
@CompRhys
Copy link
Member

Thanks for continuing to look into this, I wonder about whether it could be due to numerical instability in some of the matrix exponential/log functions c.f. scipy as these needed to be implemented in torch due to lacking first-party support. The other thing to consider is the frequency of the update of the cell for the calculation of the deformation gradient. For us the original cell is kept throughout, I believe I checked this with @lan496 and that we do match ASE here but that's the main thing I view as a little suspect in the implementation.

initial_energies = model(state)["energy"]


def run_optimization(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would maybe make two different functions run_optimization_ts and run_optimization_ase as this is already two separate functions with a toggle?

@mstapelberg
Copy link
Contributor Author

Thanks for continuing to look into this, I wonder about whether it could be due to numerical instability in some of the matrix exponential/log functions c.f. scipy as these needed to be implemented in torch due to lacking first-party support. The other thing to consider is the frequency of the update of the cell for the calculation of the deformation gradient. For us the original cell is kept throughout, I believe I checked this with @lan496 and that we do match ASE here but that's the main thing I view as a little suspect in the implementation.

You bet! I'll take a deeper look tomorrow/Monday to see what the differences are in the implementation. Thanks for your comments. Is it alright to continue using this PR to debug/track the issue?

@CompRhys
Copy link
Member

Is it alright to continue using this PR to debug/track the issue?

Yep! Actually looking at this again for pos only ASE does still take half the steps. Maybe that suggests that we're taking half steps for their full steps in the VV half step?

dr_cell = cell_dt * state.cell_velocities

# 6. Clamp to max_step
# Atoms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ASE implementation rather than norm for these comparisons it uses np.vdot https://wiki.fysik.dtu.dk/ase/_modules/ase/optimize/fire.html. np.vdot always appears to return a scalar which means that our use of norm and theirs of vdot might be a problem?

@janosh janosh added the bug Something isn't working label May 21, 2025
janosh added a commit that referenced this pull request May 21, 2025
explores potential fix proposed in #200 (comment) but doesn't seem to help
@t-reents
Copy link
Contributor

Dear all,
I was also curious and further looked into it.

Since I don't have permissions to push to this PR, I'll share some results here and point you to my branch.

Interestingly, the performance on your tests also significantly differs when choosing a 3x2x2 or 2x2x2 supercell for Si, see results below.
Finally, I'm not a 100% sure yet what exactly is happening, but one of the fixes removes the large deviations on bulk Si and the other one on Si with vacuum.

I still have some old labels in the plots below (sorry) and only considered the ase_fire for now (so don't confuse colours with the ones in your results).

  • torch-sim ASE-FIRE-old corresponds to the current version on main
  • torch-sim ASE-FIRE-new (Frechet Cell) corresponds to _ase_fire_step_fixed_dt_and_scaling
  • torch-sim ASE-FIRE (Frechet Cell) corresponds to _ase_fire_step_fixed_dt
🔽 3x2x2 supercell

fig1

fig2

fig3

🔽 2x2x2 supercell

fig1

fig2

fig3

@CompRhys
Copy link
Member

CompRhys commented May 22, 2025

All these small divergences are definitely worth tracking down! For the position only relaxations we should be able to reach a point where we relax to the same structure in the same number of steps as ASE which is ultimately our end goal. That the decision to use torch_sim should just be driven by whether you're not able to saturate your GPU hardware and therefore want batching or duplication.

@janosh implemented https://github.com/Radical-AI/torch-sim/tree/batched-vdot after I saw this issue and we didn't see a big difference over there so we likely still have an issue in how we applied this.

  • Moreover, it seems that the volumes are almost identical, i.e., the cell_velocities and cell_forces should be fine. The main differences are the positions. This has to be further investigated.

This is good to know, I agree that it feels like the bigger issue is in the position updates now as there's no good reason to explain why the torchsim implementation needs more steps beyond there being subtle bugs/differences in our code. 30% more steps is quite significant.

I would suggest we merge @mstapelberg PR here such that we lock in the improvements to these comparison scripts and then you open another PR based on your branch having rebased/reset wrt https://github.com/Radical-AI/torch-sim/tree/batched-vdot and we continue to debug there?

@mstapelberg
Copy link
Contributor Author

This plan sounds good to me! Sorry my bandwidth this week has been low with a looming committee meeting.

Thanks all for your help with this.

@CompRhys CompRhys enabled auto-merge (squash) May 22, 2025 20:26
@CompRhys CompRhys merged commit ef5c912 into TorchSim:main May 22, 2025
95 checks passed
@janosh janosh changed the title Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing Jun 10, 2025
@janosh janosh added the tests Test all the things label Jun 10, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jun 10, 2025

Warning

Rate limit exceeded

@janosh has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b626d09 and 9fc4172.

📒 Files selected for processing (6)
  • examples/scripts/7_Others/7.6_Compare_ASE_to_VV_FIRE.py (4 hunks)
  • examples/tutorials/low_level_tutorial.py (1 hunks)
  • tests/models/test_graphpes.py (3 hunks)
  • torch_sim/models/sevennet.py (1 hunks)
  • torch_sim/optimizers.py (0 hunks)
  • torch_sim/properties/correlations.py (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cla-signed Contributor license agreement signed tests Test all the things

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants