-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Small improvements to early nuts behaviour #5824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5824 +/- ##
==========================================
- Coverage 89.40% 89.38% -0.03%
==========================================
Files 74 74
Lines 13772 13771 -1
==========================================
- Hits 12313 12309 -4
- Misses 1459 1462 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
two questions:
- is the change from
log_weighted_accept_sum
tolog_accept_sum
what is making the change in step size adaptation? I am having trouble seeing that. - these changes are careful enough that I might ask for, say, 10k samples from a random 100d gaussian to show that it still provides unbiased samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That got here my accident...
I can take it out, it is useful though for the new code in covadapt (which I think is pretty nice actually, and I'll try to get into pymc at some point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to leave it in, I'm ok with it (so long as it has a comment!) I wonder what you think of algorithm 2 in https://proceedings.mlr.press/v151/hoffman22a/hoffman22a.pdf as a means of estimating scales?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thomas sent me a link to that paper just this morning. I also wonder what I'll think about algorithm 2, looks fascinating though. ;-)
If you like to compare ideas with what I did in covadapt, I just wrote a sketch of an intro in the readme: https://github.com/aseyboldt/covadapt
If you don't understand what I'm talking about over there, that's my fault. I'll try to improve it soon. :-)
Yes. Maybe it is easier to see if you only look at the first commit.
That should also be in the tests already, (https://github.com/pymc-devs/pymc/blob/main/pymc/tests/test_posteriors.py), but you are right to ask, I'll also run that manually to make sure. (also we have to run the long-running test manually) |
Notebook with the promised verification of the posterior: https://gist.github.com/aseyboldt/43354eded52981340304d3aec94ced3c |
Can we include that as slow test (which will not run by default) |
@ColCarroll can we get a binary review from you :D? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies for the wait, and thanks for the verification, @aseyboldt!
Thanks @aseyboldt and @ColCarroll! |
I guess this is an enhancement of the NUTS algorithm, so it will change results for NUTS users (i.e., most users), we assume for the better. My impression (defer to adrian!) is that these should be fairly subtle or only matter for the most careful or difficult models, so it does not require (say) a minor version release (which I'd suggest doing if the old algorithm was arguably wrong), and can go out whenever the next release was planned anyways. |
Step size adaptation:
Currently, we compute the acceptance rate of a draw only in that part of the trajectory that is accepted at the end. This way we discard some additional information about energy errors in the remaining discarded part. After this PR we compute the mean acceptance rate over all leapfrog steps we take.
Divergences:
Currently we count large derivations from the initial energy as a divergence regardless of the direction of the energy error.
This PR changes this so that only large energy errors that correspond to a low acceptance probability are considered a divergence. I think this helps with some model in the initial phase of sampling, were with the old behavior we stop promising trajectories that are too good, leading to worse performance.
Also a little bit of housekeeping:
This introduces an additional sampler statistic:
index_in_trajectory
. I don't see much of a use-case for this other than teaching and debugging, but it proved useful while comparing the sampler to other implementations.I also introduced some array copies for the integrated momentum array. I thought I saw bugs while comparing sampler behavior to nuts-rs, but somehow I can't reproduce those now. Still, I think it's better to be safe.