-
Notifications
You must be signed in to change notification settings - Fork 55
Difference in ASE FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing
#200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for continuing to look into this, I wonder about whether it could be due to numerical instability in some of the matrix exponential/log functions c.f. scipy as these needed to be implemented in torch due to lacking first-party support. The other thing to consider is the frequency of the update of the cell for the calculation of the deformation gradient. For us the original cell is kept throughout, I believe I checked this with @lan496 and that we do match ASE here but that's the main thing I view as a little suspect in the implementation. |
| initial_energies = model(state)["energy"] | ||
|
|
||
|
|
||
| def run_optimization( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would maybe make two different functions run_optimization_ts and run_optimization_ase as this is already two separate functions with a toggle?
You bet! I'll take a deeper look tomorrow/Monday to see what the differences are in the implementation. Thanks for your comments. Is it alright to continue using this PR to debug/track the issue? |
Yep! Actually looking at this again for pos only ASE does still take half the steps. Maybe that suggests that we're taking half steps for their full steps in the VV half step? |
| dr_cell = cell_dt * state.cell_velocities | ||
|
|
||
| # 6. Clamp to max_step | ||
| # Atoms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ASE implementation rather than norm for these comparisons it uses np.vdot https://wiki.fysik.dtu.dk/ase/_modules/ase/optimize/fire.html. np.vdot always appears to return a scalar which means that our use of norm and theirs of vdot might be a problem?
explores potential fix proposed in #200 (comment) but doesn't seem to help
|
Dear all, Since I don't have permissions to push to this PR, I'll share some results here and point you to my branch.
Interestingly, the performance on your tests also significantly differs when choosing a I still have some old labels in the plots below (sorry) and only considered the ase_fire for now (so don't confuse colours with the ones in your results).
|
All these small divergences are definitely worth tracking down! For the position only relaxations we should be able to reach a point where we relax to the same structure in the same number of steps as ASE which is ultimately our end goal. That the decision to use torch_sim should just be driven by whether you're not able to saturate your GPU hardware and therefore want batching or duplication.
@janosh implemented https://github.com/Radical-AI/torch-sim/tree/batched-vdot after I saw this issue and we didn't see a big difference over there so we likely still have an issue in how we applied this.
This is good to know, I agree that it feels like the bigger issue is in the position updates now as there's no good reason to explain why the torchsim implementation needs more steps beyond there being subtle bugs/differences in our code. 30% more steps is quite significant. I would suggest we merge @mstapelberg PR here such that we lock in the improvements to these comparison scripts and then you open another PR based on your branch having rebased/reset wrt https://github.com/Radical-AI/torch-sim/tree/batched-vdot and we continue to debug there? |
|
This plan sounds good to me! Sorry my bandwidth this week has been low with a looming committee meeting. Thanks all for your help with this. |
FrechetCellFilter vs Torch-Sim (both md_flavors) + enhanced convergence testing
|
Warning Rate limit exceeded@janosh has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 47 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (6)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|






Summary
I updated the 7.6 Script in examples to test both the FrechetCellFilter in torch-sim and ASE. It generates plots for each of the 5 structures in the batch. Through this I have identified that ASE consistently converges in fewer steps for every case, even when I adopt the ase_fire
md_flavor.The script tests 6 structures of (3,2,2) supercells:
Here is the comparison of convergence steps needed for each of the 6 structures:

Here are the average displacements for each structure compared to ASE:

Here are the final energies compared to their ASE Basecase:

Seems like there is an issue here?