-
Notifications
You must be signed in to change notification settings - Fork 4.4k
[change] Separate action outputs into OutputDistributions object #3514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| pass | ||
|
|
||
| @abc.abstractproperty | ||
| def total_log_probs(self) -> tf.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent to reduce_sum(log_probs, axis=1) or something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much - are we 100% sure that reduce_sum would work for all distribution types?
| ) | ||
| # Make entropy the right shape | ||
| self.entropy = tf.ones_like(tf.reshape(mu[:, 0], [-1])) * single_dim_entropy | ||
| self.entropy = distribution.entropy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we aggregate all occurrences of this line i.e. line 283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 will do it in a follow-up PR. TODO: broader discussion about standardization of discrete and continuous (requires inference changes on C#)
Proposed change(s)
Rather than have the output distribution built into the Policy, we separate these out into a separate class. Child classes of
OutputDistributioninclude both continuous and discrete versions. This opens the door to adding more types of output distributions in the future without major changes to Policy code.Types of change(s)
Checklist
Other comments