-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Fix get_value_estimate and buffer append #2276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
value_next = self.policy.get_value_estimates( | ||
bootstrapping_info, | ||
idx, | ||
info.local_done[l] and not info.max_reached[l], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to determine this just from bootstrapping_info
and idx
within get_value_estimates()
? I couldn't quite convince myself when I was looking at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure - it seems like bootstrapping_info
becomes the previous info
if the not
condition is met. Seems like we'll run into an issue if both of those conditions are met, since bootstrapping_info
will be sth different and we can't figure out if info.local_done
is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's more or less what I thought. Sounds good as it is!
Note that this should fix #1798 |
return run_out | ||
|
||
def get_value_estimates(self, brain_info, idx): | ||
def get_value_estimates(self, brain_info, idx, done): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotations if it's not too hard to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added type annotation and test for this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM, left some optional feedback.
* develop: (69 commits) Add different types of visual encoder (nature cnn/resnet) Make SubprocessEnvManager take asynchronous steps (#2265) update mypy version one more unused remove unused variables Fix respawn part of BananaLogic (#2277) fix whitespace and line breaks remove codacy (#2287) Ported documentation from other branch tennis reset parameter implementation ported over Fixed the default value to match the value in the docs two soccer reset parameter implementation ported over 3D ball reset parameter implementation ported over 3D ball reset parameter implementation ported over Relax the cloudpickle version restriction (#2279) Fix get_value_estimate and buffer append (#2276) fix lint checks Add Unity command line arguments Swap 0 set and reward buffer append (#2273) GAIL and Pretraining (#2118) ...
In newer versions of numpy (>1.14), the fact that we convert a np array into a list, append an np array, and convert it back to a list in
get_gae
causes problems.This commit changes
get_value_estimates
to output adict
of floats rather than np arrays, and uses np.append rather than converting to a list and back to resolve this issue.