diff --git a/README.md b/README.md index f955f70829..1bd1724c85 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ developer communities. * Self-play mechanism for training agents in adversarial scenarios * Train memory-enhanced agents using deep reinforcement learning * Easily definable Curriculum Learning and Generalization scenarios -* Built-in support for Imitation Learning +* Built-in support for [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning * Flexible agent control with On Demand Decision Making * Visualizing network outputs within the environment * Wrap learning environments as a gym diff --git a/docs/Training-Imitation-Learning.md b/docs/Training-Imitation-Learning.md index ac18c48cc0..a71d7ed166 100644 --- a/docs/Training-Imitation-Learning.md +++ b/docs/Training-Imitation-Learning.md @@ -8,7 +8,7 @@ of training a medic NPC. Instead of indirectly training a medic with the help of a reward function, we can give the medic real world examples of observations from the game and actions from a game controller to guide the medic's behavior. Imitation Learning uses pairs of observations and actions from -a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs). +a demonstration to learn a policy. Imitation learning can also be used to help reinforcement learning. Especially in environments with sparse (i.e., infrequent or rare) rewards, the agent may never see @@ -28,7 +28,7 @@ See Behavioral Cloning + GAIL + Curiosity + RL below.
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations. -In most scenarios, you should combine these two features +In most scenarios, you can combine these two features. * GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the @@ -37,11 +37,12 @@ In most scenarios, you should combine these two features number of demonstrations. * Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions shown in a set of demonstrations. - [The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations) - can be enabled on the PPO or SAC trainer. BC tends to work best when - there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward. + The BC feature can be enabled on the [PPO](Training-PPO.md#optional-behavioral-cloning-using-demonstrations) + or [SAC](Training-SAC.md#optional-behavioral-cloning-using-demonstrations) trainer. As BC cannot generalize + past the examples shown in the demonstrations, BC tends to work best when there exists demonstrations + for nearly all of the states that the agent can experience, or in conjunction with GAIL and/or an extrinsic reward. -### How to Choose +### What to Use If you want to help your agents learn (especially with environments that have sparse rewards) using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning @@ -55,10 +56,10 @@ example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`. ## Recording Demonstrations -It is possible to record demonstrations of agent behavior from the Unity Editor, -and save them as assets. These demonstrations contain information on the +Demonstrations of agent behavior can be recorded from the Unity Editor, +and saved as assets. These demonstrations contain information on the observations, actions, and rewards for a given agent during the recording session. -They can be managed from the Editor, as well as used for training with BC and GAIL. +They can be managed in the Editor, as well as used for training with BC and GAIL. In order to record demonstrations from an agent, add the `Demonstration Recorder` component to a GameObject in the scene which contains an `Agent` component. @@ -75,7 +76,7 @@ When `Record` is checked, a demonstration will be created whenever the scene is played from the Editor. Depending on the complexity of the task, anywhere from a few minutes or a few hours of demonstration data may be necessary to be useful for imitation learning. When you have recorded enough data, end -the Editor play session, and a `.demo` file will be created in the +the Editor play session. A `.demo` file will be created in the `Assets/Demonstrations` folder (by default). This file contains the demonstrations. Clicking on the file will provide metadata about the demonstration in the inspector. @@ -85,3 +86,19 @@ inspector. alt="BC Teacher Helper" width="375" border="10" /> + +You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file +when using BC or GAIL. For instance, for BC: + +``` + behavioral_cloning: + demo_path: