-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Asymmetric self-play #3653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Asymmetric self-play #3653
Changes from 10 commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
e19b038
ghost controller
andrewcoh 3335cc8
Merge branch 'master' into self-play-mutex
andrewcoh 49f5cf4
Merge branch 'master' into self-play-mutex
andrewcoh 33ff2ff
team id centric ghost trainer
andrewcoh 3f69db7
ELO calculation done in ghost controller
andrewcoh e19f9e5
removed opponent elo from stat collection
andrewcoh 4e1e139
passing all tests locally
andrewcoh 1741c54
fixed controller behavior when first team discovered isnt 0
andrewcoh cc17ea1
no negative team id in docs
andrewcoh 43417e1
save step on trainer step count/swap on ghost
andrewcoh 124f886
urllib parse
andrewcoh 8778cec
Update docs/Training-Self-Play.md
andrewcoh 33c5ea9
remove whitespace
andrewcoh c2eea64
Merge branch 'master' into self-play-mutex
andrewcoh bd86108
docstrings/ghost_swap -> team_change
andrewcoh 82bdfc4
replaced ghost_swap with team_change in tests
andrewcoh cb855db
docstrings for all ghost trainer functions
andrewcoh fb5ccd0
SELF-PLAY NOW SUPPORTS MULTIAGENT TRAINERS
andrewcoh c3890f5
next learning team from get step
andrewcoh cad0a2d
comment for self.ghost_step
andrewcoh f68f7aa
fixed export so both teams have current model
andrewcoh 4c9ba86
updated self-play doc for asymmetric games/changed current_self->curr…
andrewcoh ffe2cfd
count trainer steps in controller by team id
andrewcoh c2ae207
added team_change as a yaml config
andrewcoh 7e0ff7b
removed team-change CLI
andrewcoh d2dd975
fixed tests that expected old hyperparam team-change
andrewcoh 6aae133
doc update for team_change
andrewcoh d560b5f
removed not max step reached as condition for ELO
andrewcoh 2bf9271
Merge branch 'master' into self-play-mutex
andrewcoh 29435bb
warning for team change hyperparam
andrewcoh 97f1b7d
simple rl asymm ghost tests
andrewcoh d123fe7
Merge branch 'master' into self-play-mutex
andrewcoh 2cb5a2d
renamed controller methods/doc fixes
andrewcoh 27e924e
current_best_ratio -> latest_model_ratio
andrewcoh f3332c3
added Foerster paper title to doc
andrewcoh aca54be
doc fix
andrewcoh 0e52b20
Merge branch 'master' into self-play-mutex
andrewcoh 95469d2
doc fix
andrewcoh 10bd9dd
Merge branch 'master' into self-play-mutex
andrewcoh 01f9de3
Merge branch 'master' into self-play-mutex
andrewcoh 61649ea
using mlagents_env.logging instead of logging
andrewcoh 972ed63
doc fix
andrewcoh 7e0a3ba
modified doc to not include strikers vs goalie
andrewcoh 6c5342d
removed "unpredictable behavior"
andrewcoh 9149413
Merge branch 'master' into self-play-mutex
andrewcoh 02455a4
added to mig doc/address comments
andrewcoh df8b87f
raise warning when latest_model_ratio not btwn 0, 1
andrewcoh 1333fb9
removed Goalie from learning environment examples
andrewcoh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| from typing import Deque, Dict | ||
| from collections import deque | ||
| from mlagents.trainers.ghost.trainer import GhostTrainer | ||
|
|
||
|
|
||
| class GhostController(object): | ||
andrewcoh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
andrewcoh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| def __init__(self, swap_interval: int, maxlen: int = 10): | ||
| self._swap_interval = swap_interval | ||
| self._last_swap: int = 0 | ||
| self._queue: Deque[int] = deque(maxlen=maxlen) | ||
| self._learning_team: int = -1 | ||
| self._ghost_trainers: Dict[int, GhostTrainer] = {} | ||
|
|
||
| def subscribe_team_id(self, team_id: int, trainer: GhostTrainer) -> None: | ||
| if team_id not in self._ghost_trainers: | ||
| self._queue.append(team_id) | ||
| self._ghost_trainers[team_id] = trainer | ||
| if self._learning_team < 0: | ||
| self._learning_team = team_id | ||
|
|
||
| def get_learning_team(self, step: int) -> int: | ||
| if step >= self._swap_interval + self._last_swap: | ||
| self._last_swap = step | ||
| self._learning_team = self._queue.popleft() | ||
| self._queue.append(self._learning_team) | ||
| return self._learning_team | ||
|
|
||
| # Adapted from https://github.com/Unity-Technologies/ml-agents/pull/1975 and | ||
| # https://metinmediamath.wordpress.com/2013/11/27/how-to-calculate-the-elo-rating-including-example/ | ||
| # ELO calculation | ||
|
|
||
| def compute_elo_rating_changes(self, rating: float, result: float) -> float: | ||
| opponent_rating: float = 0.0 | ||
| for team_id, trainer in self._ghost_trainers.items(): | ||
| if team_id != self._learning_team: | ||
| opponent_rating = trainer.get_opponent_elo() | ||
| r1 = pow(10, rating / 400) | ||
| r2 = pow(10, opponent_rating / 400) | ||
|
|
||
| summed = r1 + r2 | ||
| e1 = r1 / summed | ||
|
|
||
| change = result - e1 | ||
| for team_id, trainer in self._ghost_trainers.items(): | ||
| if team_id != self._learning_team: | ||
| trainer.change_opponent_elo(change) | ||
|
|
||
| return change | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.