-
Notifications
You must be signed in to change notification settings - Fork 4.4k
GAIL and Pretraining #2118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
GAIL and Pretraining #2118
Changes from 143 commits
Commits
Show all changes
146 commits
Select commit
Hold shift + click to select a range
eb4abf2
New version of GAIL
awjuliani d0852ac
Move Curiosity to separate class
awjuliani 4b15b80
Curiosity fully working under new system
awjuliani ad9381b
Begin implementing GAIL
awjuliani 8bf8302
fix discrete curiosity
vincentpierre d3e244e
Add expert demonstration
awjuliani a5b95f7
Remove notebook
awjuliani dc2fcaa
Record intrinsic rewards properly
awjuliani 49cff40
Add gail model updating
awjuliani 48d3769
Code cleanup
awjuliani 6eeb565
Nested structure for intrinsic rewards
awjuliani 8ca7728
Rename files
awjuliani 226b5c7
Update models so files
awjuliani 3386aa7
fix typo
awjuliani 6799756
Add reward strength parameter
awjuliani 468c407
Use dictionary of reward signals
awjuliani 519e2d3
Remove reward manager
awjuliani 7df1a69
Extrinsic reward just another type
awjuliani 99237cd
Clean up imports
awjuliani 9fa51c1
All reward signals use strength to scale output
awjuliani 7f24677
produce scaled and unscaled reward
awjuliani 4a714d0
Remove unused dictionary
awjuliani 3e2671d
Current trainer config
awjuliani 77211d8
Add discrete control and pyramid experimentation
awjuliani 2334de8
Minor changes to GAIL
awjuliani 439387e
Add relevant strength parameters
awjuliani ba793a3
Replace string
awjuliani a52ba0b
Add support for visual observations w/ GAIL
awjuliani 5b2ef22
Finish implementing visual obs for GAIL
awjuliani 13542b4
Include demo files
awjuliani ae7a8b0
Fix for RNN w/ GAIL
awjuliani bf89082
Keep track of reward streams separately
awjuliani 360482b
Bootstrap value estimates separately
awjuliani c78639d
Add value head
awjuliani 3b2485d
Use sepaprate value streams for each reward
awjuliani 40bc9ba
Add VAIL
awjuliani c6e1504
Use adaptive B
awjuliani 60d9ff7
Comments improvements
vincentpierre 49ec682
Added comments and refactored a pievce of the code
vincentpierre d9847e0
Added Comments
vincentpierre dc7620b
Fix on Curriosity
vincentpierre 28e0bd5
Fixed typo
vincentpierre 0257d2b
Added a forgotten comment
vincentpierre fd55c00
Stabilized Vail learning. Still no learning for Walker
vincentpierre 2343b3f
Fixing typo on curiosity when using visual input
vincentpierre c74ad19
Added some comments
vincentpierre 2dd7c61
modified the hyperparameters
vincentpierre 42429a5
Fixed some of the tests, will need to refactor the reward signals in …
vincentpierre ec0e106
Putting the has_updated fags inside each reward signal
vincentpierre 6ae1c2f
Added comments for the GAIL update method
vincentpierre ef65bc2
initial commit
vincentpierre 8cbdbf4
No more normalization after pre-training
vincentpierre 3f35d45
Fixed large bug in Vail
vincentpierre 3be9be7
BUG FIX VAIL : The noise dimension was wrong and the discriminator sc…
vincentpierre 9e9b4ff
implemented discrete control pretraining
vincentpierre d537a6b
bug fixing
vincentpierre 713263c
Bug fix, still not tested for recurrent
vincentpierre ca5b948
Fixing beta in GAIL so it will change properly
vincentpierre 671629e
Allow for not specifying an extrinsic reward
a31c8a5
Rough implementation of annealed BC
93cb4ff
Fixes for rebase onto v0.8
6534291
Moved BC trainer out of reward_signals and code cleanup
700b478
Rename folder to "components"
71eedf5
Fix renaming in Curiosity
83b4603
Remove demo_aided as a required param
9e4b4e2
Make old BC compatible
f814432
Fix visual obs for curiosity
e10194f
Tweaks all around
fdcfb30
Add reward normalization and bug fix
cb5e927
Load multiple .demo files. Fix bug with csv nans
2c5c853
Remove reward normalization
e66a343
Rename demo_aided to pretraining
0a98289
Fix bc configs
cd6e498
Increase small val to prevent NaNs
d23f6f3
Fix init in components
d93e36e
Merge remote-tracking branch 'origin/develop' into develop-irl-ervin
1bf68c7
Fix PPO tests
9da6e6c
Refactor components into common location
4a57a32
Minor code cleanup
11cc6f9
Preliminary RNN support
e66a6f7
Revert regression with NaNs for LSTMs
bea2bc7
Better LSTM support for BC
6302a55
Code cleanup and black reformat
d1cded9
Remove demo_helper and reformat signal
2b98f3b
Tests for GAIL and curiosity
440146b
Fix Black again...
98f9160
Tests for BCModule and visual tests for RewardSignals
5c923cb
Refactor to new structure and use class generator
e7ce888
Generalize reward_signal interface and stats
858194f
Fix incorrect environment reward reporting
28bceba
Rename reward signals for consistency. clean up comments
248cae4
Default trainer config (for cloud testing)
744df94
Remove "curiosity_enc_size" from the regular params
31dabfc
Fix PushBlock config
a557f84
Revert Pyramids environment
d4dbddb
Fix indexing issue with add_experiences
ddb673b
Fix tests
975e05b
Change to BCModule
a83fd5d
Merge branch 'develop' into develop-irl-ervin
fae7646
Remove the bools for reward signals
5cf98ac
Make update take in a mini buffer rather than the
d1afc9b
Always reference reward signals name and not index
80f2c75
More code cleanup
394b25a
Clean up reward_signal abstract class
a9724a3
Fix issue with recording values
66fef61
Add use_actions to GAIL
0e3be1d
Add documentation for Reward Signals
015f50d
Add documentation for GAIL
7c3059b
Remove unused variables in BCModel
16c3c06
Remove Entropy Reward Signal
1fbfa5d
Change tests to use safe_load
f9a3808
Don't use mutable default
ce551bf
Set defaults in parent __init__ (Reward Signals)
3e7ea5b
Remove unneccesary lines
eda6993
Merge branch 'develop' into develop-irl-ervin
cace2e6
Make some files same as develop
3f161fc
Add demos for example envs
2794c75
Update docs
48b7b43
Fix tests, imports, cleanup code
f47b173
Make pretrainer stats similar to reward signal
1e257d4
Merge branch 'develop' of github.com:Unity-Technologies/ml-agents int…
a8b5d09
Fixes after merge develop
fb3d5ae
Additional tests, bugfix for LSTM+BC+Visual
7e0a677
GAIL code cleanup
1953233
Add types to BCModel
593f819
Fix bugs with incorrect return values
98b7732
Change tests to use RewardSignalResult
6ee0c63
Add docs for pretraining and plot for all three
6d37be2
Fix bug with demo loading directories, add test
c672ad9
Add typing to BCModule, GAIL, and demo loader
61e84c6
Fix black
9d43336
Fix mypy issues
99a2a3c
Codacy cleanup
cbb1af3
Doc fixes
736c807
More sophisticated tests for reward signals
04e22fd
Fix bug in GAIL when num_sequences is 1
8ead02e
Clean up use_vail and feed_dicts
71f85e1
Change to swish from learningmodel
5537e60
Make variables more readable
73d20cb
Code and comment cleanup
f4950b4
Not all should be swish
6784ee6
Remove prints
2704e62
Doc updates
1206a89
Make VAIL default false, improve logging
2407a5a
Fix tests for sequences
4aa033b
Change max_batches and set VAIL to default to false
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| # Training with Behavioral Cloning | ||
|
|
||
| There are a variety of possible imitation learning algorithms which can | ||
| be used, the simplest one of them is Behavioral Cloning. It works by collecting | ||
| demonstrations from a teacher, and then simply uses them to directly learn a | ||
| policy, in the same way the supervised learning for image classification | ||
| or other traditional Machine Learning tasks work. | ||
|
|
||
| ## Offline Training | ||
|
|
||
| With offline behavioral cloning, we can use demonstrations (`.demo` files) | ||
| generated using the `Demonstration Recorder` as the dataset used to train a behavior. | ||
|
|
||
| 1. Choose an agent you would like to learn to imitate some set of demonstrations. | ||
| 2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)). | ||
| For illustrative purposes we will refer to this file as `AgentRecording.demo`. | ||
| 3. Build the scene, assigning the agent a Learning Brain, and set the Brain to | ||
| Control in the Broadcast Hub. For more information on Brains, see | ||
| [here](Learning-Environment-Design-Brains.md). | ||
| 4. Open the `config/offline_bc_config.yaml` file. | ||
| 5. Modify the `demo_path` parameter in the file to reference the path to the | ||
| demonstration file recorded in step 2. In our case this is: | ||
| `./UnitySDK/Assets/Demonstrations/AgentRecording.demo` | ||
| 6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` | ||
| as the config parameter, and include the `--run-id` and `--train` as usual. | ||
| Provide your environment as the `--env` parameter if it has been compiled | ||
| as standalone, or omit to train in the editor. | ||
| 7. (Optional) Observe training performance using TensorBoard. | ||
|
|
||
| This will use the demonstration file to train a neural network driven agent | ||
| to directly imitate the actions provided in the demonstration. The environment | ||
| will launch and be used for evaluating the agent's performance during training. | ||
|
|
||
| ## Online Training | ||
|
|
||
| It is also possible to provide demonstrations in realtime during training, | ||
| without pre-recording a demonstration file. The steps to do this are as follows: | ||
|
|
||
| 1. First create two Brains, one which will be the "Teacher," and the other which | ||
| will be the "Student." We will assume that the names of the Brain | ||
| Assets are "Teacher" and "Student" respectively. | ||
| 2. The "Teacher" Brain must be a **Player Brain**. You must properly | ||
| configure the inputs to map to the corresponding actions. | ||
| 3. The "Student" Brain must be a **Learning Brain**. | ||
| 4. The Brain Parameters of both the "Teacher" and "Student" Brains must be | ||
| compatible with the agent. | ||
| 5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub` | ||
| and check the `Control` checkbox on the "Student" Brain. | ||
| 6. Link the Brains to the desired Agents (one Agent as the teacher and at least | ||
| one Agent as a student). | ||
| 7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set | ||
| the `trainer` parameter of this entry to `online_bc`, and the | ||
| `brain_to_imitate` parameter to the name of the teacher Brain: "Teacher". | ||
| Additionally, set `batches_per_epoch`, which controls how much training to do | ||
| each moment. Increase the `max_steps` option if you'd like to keep training | ||
| the Agents for a longer period of time. | ||
| 8. Launch the training process with `mlagents-learn config/online_bc_config.yaml | ||
| --train --slow`, and press the :arrow_forward: button in Unity when the | ||
| message _"Start training by pressing the Play button in the Unity Editor"_ is | ||
| displayed on the screen | ||
| 9. From the Unity window, control the Agent with the Teacher Brain by providing | ||
| "teacher demonstrations" of the behavior you would like to see. | ||
| 10. Watch as the Agent(s) with the student Brain attached begin to behave | ||
| similarly to the demonstrations. | ||
| 11. Once the Student Agents are exhibiting the desired behavior, end the training | ||
| process with `CTL+C` from the command line. | ||
| 12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the | ||
| Assets folder (or a subdirectory within Assets of your choosing) , and use | ||
| with `Learning` Brain. | ||
|
|
||
| **BC Teacher Helper** | ||
|
|
||
| We provide a convenience utility, `BC Teacher Helper` component that you can add | ||
| to the Teacher Agent. | ||
|
|
||
| <p align="center"> | ||
| <img src="images/bc_teacher_helper.png" | ||
| alt="BC Teacher Helper" | ||
| width="375" border="10" /> | ||
| </p> | ||
|
|
||
| This utility enables you to use keyboard shortcuts to do the following: | ||
|
|
||
| 1. To start and stop recording experiences. This is useful in case you'd like to | ||
| interact with the game _but not have the agents learn from these | ||
| interactions_. The default command to toggle this is to press `R` on the | ||
| keyboard. | ||
|
|
||
| 2. Reset the training buffer. This enables you to instruct the agents to forget | ||
| their buffer of recent experiences. This is useful if you'd like to get them | ||
| to quickly learn a new behavior. The default command to reset the buffer is | ||
| to press `C` on the keyboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one might be a little confusing to people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I also think it's necessary for people who have a huge demonstration dataset. We could do a couple of things:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is useful to have. I think it just needs a different name that is a little more descriptive. "Samples Per Update" could fit the bill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! (and yes any issues were related to the stochasticity)