Skip to content

Conversation

@andrewcoh
Copy link
Contributor

@andrewcoh andrewcoh commented Mar 26, 2020

Proposed change(s)

This PR adds the StrikersVsGoalie env as well as some refactors to Tennis and Soccer

Tennis:

  • Removed the net causing a loss of a point on the serve. I think the agents were developing weird biases and I think it was partially the culprit behind the boring game play
  • Added a timestep penalty. The agents are now much more aggressive.

Reserving tennis changes for a separate PR

Soccer

  • Extended the AgentSoccer.cs script to support strikers/goalies/generic soccer agents with different reward functions.
  • Added a curriculum that should speed up learning

Finally, I tweaked the hyperparameters for all these envs to be a bit more stable in training.

TODO:

  • Add Soccer and Tennis brains (currently retraining with curriculum/no net but training with the original envs yielded good results).
  • Add to docs for new StrikerVsGoalie env
  • Add to self-play docs regarding curriculum recommendation in specifying rewards

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

else
{
// Existential penalty cumulant for Generic
timePenalty += -1f / 3000f;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, I accumulate timestep penalty and give it all at once at the end of the episode if an agent wins. The point here is so that there's never additional utility in 'losing faster' i.e. if each agent got timestep penalty irrespective of outcome but still utility in winning faster. The trained soccertwos agents look much more stable defensively and I attribute that to this.

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet loaded the scene to see for myself but I left some initial feedback. I will take another pass later.

else
{
// Existential penalty cumulant for Generic
timePenalty += -1f / 3000f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only accumulate this time penalty for the "generic" position. Is that intended or were you trying to give it to the strikers ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was intended. The strikers deserve it every timestep

if (position == Position.Goalie)
{
// Existential bonus for Goalies.
AddReward(1f / 3000f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a const rather than copy 1f / 3000f in these 3 places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this number dependent on the agent max_step ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a private float that's set in Initialize using maxStep.

}
if (c.gameObject.CompareTag("ball"))
{
// Generic gets curriculum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only the generic agent gets curriculum. I can remove the if statement because the default value of m_BallTouch will be zero since it's set using curriculum.

the ball from entering own goal.
* Agents: The environment contains four agents, with the same
Behavior Parameters : SoccerTwos.
* Agent Reward Function (dependent):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does dependent mean ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agents share the reward function.

@vincentpierre vincentpierre self-requested a review April 16, 2020 22:32
@andrewcoh andrewcoh merged commit 6665972 into master Apr 17, 2020
@delete-merged-branch delete-merged-branch bot deleted the soccer-2v1 branch April 17, 2020 01:07
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants