Skip to content

Conversation

@awjuliani
Copy link
Contributor

@awjuliani awjuliani commented Apr 4, 2018

Value estimate bootstrapping upon episode timeout (ie max_steps reached) was incorrectly using observation from new episode, rather than final observation from current episode. This PR addresses this, and improves performance for environments in which the "end of the episode" is an arbitrary timeout (such as 3DBall, Reacher, and Crawler).

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not happy about changing process_experiences, might break the trainers someone else implemented

@awjuliani awjuliani merged commit da56ddd into hotfix-0.3.0c Apr 6, 2018
@awjuliani awjuliani deleted the hotfix-bootstrapping branch April 6, 2018 23:25
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants