-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Remove Offline BC Training #2969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
48d871f
3d18773
cedd0a0
94fd18f
3612e62
14f36a0
ee8c1e6
c618c85
29df9b6
82c5ca2
71228a1
2866892
7c672f1
509f29f
1cf6777
d5bb854
698d0cb
522d8c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -224,24 +224,22 @@ the agent will need to remember in order to successfully complete the task. | |
|
|
||
| Typical Range: `64` - `512` | ||
|
|
||
| ## (Optional) Pretraining Using Demonstrations | ||
| ## (Optional) Behavioral Cloning Using Demonstrations | ||
|
|
||
| In some cases, you might want to bootstrap the agent's policy using behavior recorded | ||
| from a player. This can help guide the agent towards the reward. Pretraining adds | ||
| from a player. This can help guide the agent towards the reward. Behavioral Cloning (BC) adds | ||
| training operations that mimic a demonstration rather than attempting to maximize reward. | ||
| It is essentially equivalent to running [behavioral cloning](Training-Behavioral-Cloning.md) | ||
| in-line with PPO. | ||
|
|
||
| To use pretraining, add a `pretraining` section to the trainer_config. For instance: | ||
| To use BC, add a `behavioral_cloning` section to the trainer_config. For instance: | ||
|
|
||
| ``` | ||
| pretraining: | ||
| behavioral_cloning: | ||
| demo_path: ./demos/ExpertPyramid.demo | ||
| strength: 0.5 | ||
| steps: 10000 | ||
| ``` | ||
|
|
||
| Below are the available hyperparameters for pretraining. | ||
| Below are the available hyperparameters for BC. | ||
|
|
||
| ### Strength | ||
|
|
||
|
|
@@ -258,10 +256,10 @@ See the [imitation learning guide](Training-Imitation-Learning.md) for more on ` | |
|
|
||
| ### Steps | ||
|
|
||
| During pretraining, it is often desirable to stop using demonstrations after the agent has | ||
| During BC, it is often desirable to stop using demonstrations after the agent has | ||
| "seen" rewards, and allow it to optimize past the available demonstrations and/or generalize | ||
| outside of the provided demonstrations. `steps` corresponds to the training steps over which | ||
| pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set | ||
| BC is active. The learning rate of the cloning will anneal over the steps. Set | ||
|
||
| the steps to 0 for constant imitation over the entire training run. | ||
|
|
||
| ### (Optional) Batch Size | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -239,24 +239,22 @@ default. | |
|
|
||
| Default: `False` | ||
|
|
||
| ## (Optional) Pretraining Using Demonstrations | ||
| ## (Optional) Behavioral Cloning Using Demonstrations | ||
|
|
||
| In some cases, you might want to bootstrap the agent's policy using behavior recorded | ||
| from a player. This can help guide the agent towards the reward. Pretraining adds | ||
| from a player. This can help guide the agent towards the reward. Behavioral Cloning (BC) adds | ||
| training operations that mimic a demonstration rather than attempting to maximize reward. | ||
| It is essentially equivalent to running [behavioral cloning](./Training-Behavioral-Cloning.md) | ||
| in-line with SAC. | ||
|
|
||
| To use pretraining, add a `pretraining` section to the trainer_config. For instance: | ||
| To use BC, add a `behavioral_cloning` section to the trainer_config. For instance: | ||
|
|
||
| ``` | ||
| pretraining: | ||
| behavioral_cloning: | ||
| demo_path: ./demos/ExpertPyramid.demo | ||
| strength: 0.5 | ||
| steps: 10000 | ||
| ``` | ||
|
|
||
| Below are the available hyperparameters for pretraining. | ||
| Below are the available hyperparameters for BC. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should Behavioral Cloning be abbreviated? I think it would be better to keep it consistent in the docs. What do you think?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I abbreviated it since we abbreviate PPO, SAC, and GAIL. I think the 1st mention on any particular page should be full with the abbreviation in parenthesis, then abbreviated - what do you think?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. |
||
|
|
||
| ### Strength | ||
|
|
||
|
|
@@ -273,10 +271,10 @@ See the [imitation learning guide](Training-Imitation-Learning.md) for more on ` | |
|
|
||
| ### Steps | ||
|
|
||
| During pretraining, it is often desirable to stop using demonstrations after the agent has | ||
| During BC, it is often desirable to stop using demonstrations after the agent has | ||
| "seen" rewards, and allow it to optimize past the available demonstrations and/or generalize | ||
| outside of the provided demonstrations. `steps` corresponds to the training steps over which | ||
| pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set | ||
| BC is active. The learning rate of the pretrainer will anneal over the steps. Set | ||
| the steps to 0 for constant imitation over the entire training run. | ||
|
|
||
| ### (Optional) Batch Size | ||
|
|
||
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you'll need to change the legend in the linked image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed