StrikerVsGoalie and other self-play env improvements #3699

andrewcoh · 2020-03-26T21:12:26Z

Proposed change(s)

This PR adds the StrikersVsGoalie env as well as some refactors to Tennis and Soccer

~~Tennis:~~

~~Removed the net causing a loss of a point on the serve. I think the agents were developing weird biases and I think it was partially the culprit behind the boring game play~~
~~Added a timestep penalty. The agents are now much more aggressive.~~

Reserving tennis changes for a separate PR

Soccer

Extended the AgentSoccer.cs script to support strikers/goalies/generic soccer agents with different reward functions.
Added a curriculum that should speed up learning

Finally, I tweaked the hyperparameters for all these envs to be a bit more stable in training.

TODO:

~~Add Soccer and Tennis brains (currently retraining with curriculum/no net but training with the original envs yielded good results).~~
~~Add to docs for new StrikerVsGoalie env~~
~~Add to self-play docs regarding curriculum recommendation in specifying rewards~~

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

Co-Authored-By: Vincent-Pierre BERGES <[email protected]>

This reverts commit 1d3e1cf.

andrewcoh · 2020-04-14T23:29:47Z

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

+        else
+        {
+            // Existential penalty cumulant for Generic
+            timePenalty += -1f / 3000f;


In this way, I accumulate timestep penalty and give it all at once at the end of the episode if an agent wins. The point here is so that there's never additional utility in 'losing faster' i.e. if each agent got timestep penalty irrespective of outcome but still utility in winning faster. The trained soccertwos agents look much more stable defensively and I attribute that to this.

ervteng

Looks good

vincentpierre

I have not yet loaded the scene to see for myself but I left some initial feedback. I will take another pass later.

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

vincentpierre · 2020-04-16T22:25:30Z

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

+        else
+        {
+            // Existential penalty cumulant for Generic
+            timePenalty += -1f / 3000f;


You only accumulate this time penalty for the "generic" position. Is that intended or were you trying to give it to the strikers ?

That was intended. The strikers deserve it every timestep

vincentpierre · 2020-04-16T22:25:58Z

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

+        if (position == Position.Goalie)
+        {
+            // Existential bonus for Goalies.
+            AddReward(1f / 3000f);


Make a const rather than copy 1f / 3000f in these 3 places.

Also, is this number dependent on the agent max_step ?

I added a private float that's set in Initialize using maxStep.

vincentpierre · 2020-04-16T22:26:43Z

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

+        }
        if (c.gameObject.CompareTag("ball"))
        {
+            // Generic gets curriculum


What does this comment mean ?

That only the generic agent gets curriculum. I can remove the if statement because the default value of m_BallTouch will be zero since it's set using curriculum.

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

docs/Learning-Environment-Examples.md

vincentpierre · 2020-04-16T22:31:34Z

docs/Learning-Environment-Examples.md

+  the ball from entering own goal.
+* Agents: The environment contains four agents, with the same
+  Behavior Parameters : SoccerTwos.
+* Agent Reward Function (dependent):


What does dependent mean ?

The agents share the reward function.

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

andrewcoh and others added 30 commits February 29, 2020 10:15

ghost controller

e19b038

Merge branch 'master' into self-play-mutex

3335cc8

Merge branch 'master' into self-play-mutex

49f5cf4

team id centric ghost trainer

33ff2ff

ELO calculation done in ghost controller

3f69db7

removed opponent elo from stat collection

e19f9e5

passing all tests locally

4e1e139

fixed controller behavior when first team discovered isnt 0

1741c54

no negative team id in docs

cc17ea1

save step on trainer step count/swap on ghost

43417e1

soccer 2v1 on the cloud

219a7f7

urllib parse

124f886

Update docs/Training-Self-Play.md

8778cec

Co-Authored-By: Vincent-Pierre BERGES <[email protected]>

remove whitespace

33c5ea9

Merge branch 'master' into self-play-mutex

c2eea64

bonus to single agent

fec8107

running soccer for more steps

de0d7fe

added soccer brains

200823a

docstrings/ghost_swap -> team_change

bd86108

Merge branch 'self-play-mutex' into soccer-2v1

954ac8c

replaced ghost_swap with team_change in tests

82bdfc4

Merge branch 'self-play-mutex' into soccer-2v1

a344bc0

docstrings for all ghost trainer functions

cb855db

SELF-PLAY NOW SUPPORTS MULTIAGENT TRAINERS

fb5ccd0

Merge branch 'self-play-mutex' into soccer-2v1

3f84319

tennis config

84c1b74

ghost->get_step

15507fa

next learning team from get step

c3890f5

Merge branch 'self-play-mutex' into soccer-2v1

53ddb44

tennis config restored

7ff3618

andrewcoh added 16 commits April 7, 2020 14:28

reduced bounciness/added downward force tennis

9ae69d9

added floorhit obs tennis

1d3e1cf

Revert "added floorhit obs tennis"

27f53d0

This reverts commit 1d3e1cf.

increased scale tennis env

7836cea

update striker vs goalie brain/retrain

857cbb1

increased tennis xurriculum

267ab83

Merge branch 'master' into soccer-2v1

a73235a

fixed agent action constraint tennis

9d99f22

Striker vs Goalie brains

79d06de

Merge branch 'master' into soccer-2v1

eee8cd3

new goalie striker brain

3881ad9

updated config

ae2b7d1

revert tennis back to master. Tennis changes in separate PR

da06f64

broke project version

d35d483

remove tennis curricula

8e98e58

update docs

0ab0391

andrewcoh commented Apr 14, 2020

View reviewed changes

andrewcoh and others added 4 commits April 14, 2020 16:39

fixed learning env example docs

d338919

doc improvements

c14dfbf

Merge branch 'master' into soccer-2v1

dc166a3

Replace texture on wall

e0f7cd4

ervteng approved these changes Apr 16, 2020

View reviewed changes

vincentpierre reviewed Apr 16, 2020

View reviewed changes

vincentpierre self-requested a review April 16, 2020 22:32

andrewcoh added 2 commits April 16, 2020 16:04

address vinces comments

ceb3184

expanded reward of generic soccer agent

468687e

vincentpierre approved these changes Apr 16, 2020

View reviewed changes

andrewcoh merged commit 6665972 into master Apr 17, 2020

delete-merged-branch bot deleted the soccer-2v1 branch April 17, 2020 01:07

github-actions bot locked as resolved and limited conversation to collaborators May 15, 2021

StrikerVsGoalie and other self-play env improvements #3699

StrikerVsGoalie and other self-play env improvements #3699

Uh oh!

Conversation

andrewcoh commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ervteng left a comment

Choose a reason for hiding this comment

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewcoh commented Mar 26, 2020 •

edited

Loading