Fix get_value_estimate and buffer append #2276

ervteng · 2019-07-17T00:53:33Z

In newer versions of numpy (>1.14), the fact that we convert a np array into a list, append an np array, and convert it back to a list in get_gae causes problems.

This commit changes get_value_estimates to output a dict of floats rather than np arrays, and uses np.append rather than converting to a list and back to resolve this issue.

chriselion · 2019-07-17T01:55:47Z

ml-agents/mlagents/trainers/ppo/trainer.py

+                value_next = self.policy.get_value_estimates(
+                    bootstrapping_info,
+                    idx,
+                    info.local_done[l] and not info.max_reached[l],


Is it possible to determine this just from bootstrapping_info and idx within get_value_estimates()? I couldn't quite convince myself when I was looking at it.

Hmm, I'm not sure - it seems like bootstrapping_info becomes the previous info if the not condition is met. Seems like we'll run into an issue if both of those conditions are met, since bootstrapping_info will be sth different and we can't figure out if info.local_done is True.

OK, that's more or less what I thought. Sounds good as it is!

chriselion · 2019-07-17T01:57:04Z

Note that this should fix #1798

chriselion · 2019-07-17T01:57:37Z

ml-agents/mlagents/trainers/ppo/policy.py

        return run_out

-    def get_value_estimates(self, brain_info, idx):
+    def get_value_estimates(self, brain_info, idx, done):


type annotations if it's not too hard to add?

Added type annotation and test for this function

chriselion

LTGM, left some optional feedback.

* develop: (69 commits) Add different types of visual encoder (nature cnn/resnet) Make SubprocessEnvManager take asynchronous steps (#2265) update mypy version one more unused remove unused variables Fix respawn part of BananaLogic (#2277) fix whitespace and line breaks remove codacy (#2287) Ported documentation from other branch tennis reset parameter implementation ported over Fixed the default value to match the value in the docs two soccer reset parameter implementation ported over 3D ball reset parameter implementation ported over 3D ball reset parameter implementation ported over Relax the cloudpickle version restriction (#2279) Fix get_value_estimate and buffer append (#2276) fix lint checks Add Unity command line arguments Swap 0 set and reward buffer append (#2273) GAIL and Pretraining (#2118) ...

Ervin Teng added 3 commits July 16, 2019 16:29

make get_value_estimates output a dict of floats

5179b84

Make return cleaner in get_value_estimates

dcf4d46

Use np.append instead of convert to list, unconvert

94d66d5

ervteng requested review from awjuliani and chriselion July 17, 2019 00:53

chriselion reviewed Jul 17, 2019

View reviewed changes

chriselion approved these changes Jul 17, 2019

View reviewed changes

Ervin Teng added 2 commits July 17, 2019 10:19

Add type checks and test for get_value_estimates

1872e43

Add type for feed_dict

8755678

awjuliani approved these changes Jul 17, 2019

View reviewed changes

ervteng merged commit c025086 into develop Jul 17, 2019

ervteng deleted the develop-fixgetvalueest branch July 17, 2019 17:37

chriselion mentioned this pull request Jul 23, 2019

Update buffer.py #2251

Closed

github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix get_value_estimate and buffer append #2276

Fix get_value_estimate and buffer append #2276

Uh oh!

ervteng commented Jul 17, 2019

Uh oh!

chriselion Jul 17, 2019

Uh oh!

ervteng Jul 17, 2019

Uh oh!

chriselion Jul 17, 2019

Uh oh!

chriselion commented Jul 17, 2019

Uh oh!

chriselion Jul 17, 2019

Uh oh!

ervteng Jul 17, 2019

Uh oh!

chriselion left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix get_value_estimate and buffer append #2276

Fix get_value_estimate and buffer append #2276

Uh oh!

Conversation

ervteng commented Jul 17, 2019

Uh oh!

chriselion Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

ervteng Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

chriselion Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

chriselion commented Jul 17, 2019

Uh oh!

chriselion Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

ervteng Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

chriselion left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants