Skip to content

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented Jul 17, 2019

In newer versions of numpy (>1.14), the fact that we convert a np array into a list, append an np array, and convert it back to a list in get_gae causes problems.

This commit changes get_value_estimates to output a dict of floats rather than np arrays, and uses np.append rather than converting to a list and back to resolve this issue.

@ervteng ervteng requested review from awjuliani and chriselion July 17, 2019 00:53
value_next = self.policy.get_value_estimates(
bootstrapping_info,
idx,
info.local_done[l] and not info.max_reached[l],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to determine this just from bootstrapping_info and idx within get_value_estimates()? I couldn't quite convince myself when I was looking at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure - it seems like bootstrapping_info becomes the previous info if the not condition is met. Seems like we'll run into an issue if both of those conditions are met, since bootstrapping_info will be sth different and we can't figure out if info.local_done is True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's more or less what I thought. Sounds good as it is!

@chriselion
Copy link
Contributor

Note that this should fix #1798

return run_out

def get_value_estimates(self, brain_info, idx):
def get_value_estimates(self, brain_info, idx, done):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annotations if it's not too hard to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added type annotation and test for this function

Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM, left some optional feedback.

@ervteng ervteng merged commit c025086 into develop Jul 17, 2019
@ervteng ervteng deleted the develop-fixgetvalueest branch July 17, 2019 17:37
mantasp added a commit that referenced this pull request Jul 22, 2019
* develop: (69 commits)
  Add different types of visual encoder (nature cnn/resnet)
  Make SubprocessEnvManager take asynchronous steps (#2265)
  update mypy version
  one more unused
  remove unused variables
  Fix respawn part of BananaLogic (#2277)
  fix whitespace and line breaks
  remove codacy (#2287)
  Ported documentation from other branch
  tennis reset parameter implementation ported over
  Fixed the default value to match the value in the docs
  two soccer reset parameter implementation ported over
  3D ball reset parameter implementation ported over
  3D ball reset parameter implementation ported over
  Relax the cloudpickle version restriction (#2279)
  Fix get_value_estimate and buffer append (#2276)
  fix lint checks
  Add Unity command line arguments
  Swap 0 set and reward buffer append (#2273)
  GAIL and Pretraining (#2118)
  ...
@chriselion chriselion mentioned this pull request Jul 23, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants