Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
273 changes: 89 additions & 184 deletions docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
@@ -1,196 +1,116 @@
# Training ML-Agents

The ML-Agents toolkit conducts training using an external Python training
process. During training, this external process communicates with the Academy
to generate a block of agent experiences. These
experiences become the training set for a neural network used to optimize the
agent's policy (which is essentially a mathematical function mapping
observations to actions). In reinforcement learning, the neural network
optimizes the policy by maximizing the expected rewards. In imitation learning,
the neural network optimizes the policy to achieve the smallest difference
between the actions chosen by the agent trainee and the actions chosen by the
expert in the same situation.

The output of the training process is a model file containing the optimized
policy. This model file is a TensorFlow data graph containing the mathematical
operations and the optimized weights selected during the training process. You
can set the generated model file in the Behaviors Parameters under your
Agent in your Unity project to decide the best course of action for an agent.

Use the command `mlagents-learn` to train your agents. This command is installed
with the `mlagents` package and its implementation can be found at
`ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
like `config/trainer_config.yaml` specifies the hyperparameters used during training.
You can edit this file with a text editor to add a specific configuration for
each Behavior.

For a broader overview of reinforcement learning, imitation learning and the
ML-Agents training process, see [ML-Agents Toolkit
Overview](ML-Agents-Overview.md).
For a broad overview of reinforcement learning, imitation learning and all the
training scenarios, methods and options within the ML-Agents Toolkit, see
[ML-Agents Toolkit Overview](ML-Agents-Overview.md).

## Training with mlagents-learn
Once your learning environment has been created and is ready for training, the next
step is to initiate a training run. Training in the ML-Agents Toolkit is powered
by a dedicated Python package, `mlagents`. This package exposes a command `mlagents-learn` that
is the single entry point for all training workflows (e.g. reinforcement
leaning, imitation learning, curriculum learning). Its implementation can be found at
[ml-agents/mlagents/trainers/learn.py](../ml-agents/mlagents/trainers/learn.py).

Use the `mlagents-learn` command to train agents. `mlagents-learn` supports
training with
[reinforcement learning](Background-Machine-Learning.md#reinforcement-learning),
[curriculum learning](Training-Curriculum-Learning.md),
and [behavioral cloning imitation learning](Training-Imitation-Learning.md).
## Training with mlagents-learn

Run `mlagents-learn` from the command line to launch the training process. Use
the command line patterns and the `config/trainer_config.yaml` file to control
training options.
### Starting Training

The basic command for training is:
`mlagents-learn` is the main training utility provided by the ML-Agents Toolkit. It
accepts a number of CLI options in addition to a YAML configuration file that contains
all the configurations and hyperparameters to be used during training. The set of
configurations and hyperparameters to include in this file depend on the agents in your
environment and the specific training method you wish to utilize. Keep in mind that
the hyperparameter values can have a big impact on the training performance (i.e. your
agent's ability to learn a policy that solves the task). In this page, we will review all the
hyperparameters for all training methods and provide guidelines and advice on their values.

To view a description of all the CLI options accepted by `mlagents-learn`, use the `--help`:
```sh
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>
mlagents-learn --help
```

where

* `<trainer-config-file>` is the file path of the trainer configuration yaml.
* `<env_name>`__(Optional)__ is the name (including path) of your Unity
executable containing the agents to be trained. If `<env_name>` is not passed,
the training will happen in the Editor. Press the :arrow_forward: button in
Unity when the message _"Start training by pressing the Play button in the
Unity Editor"_ is displayed on the screen.
* `<run-identifier>` is an optional identifier you can use to identify the
results of individual training runs.

For example, suppose you have a project in Unity named "CatsOnBicycles" which
contains agents ready to train. To perform the training:

1. [Build the project](Learning-Environment-Executable.md), making sure that you
only include the training scene.
2. Open a terminal or console window.
3. Navigate to the directory where you installed the ML-Agents Toolkit.
4. Run the following to launch the training process using the path to the Unity
environment you built in step 1:
The basic command for training is:

```sh
mlagents-learn config/trainer_config.yaml --env=../../projects/Cats/CatsOnBicycles.app --run-id=cob_1
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>
```

During a training session, the training program prints out and saves updates at
regular intervals (specified by the `summary_freq` option). The saved statistics
are grouped by the `run-id` value so you should assign a unique id to each
training run if you plan to view the statistics. You can view these statistics
using TensorBoard during or after training by running the following command:

```sh
tensorboard --logdir=summaries --port 6006
```
where

And then opening the URL: [localhost:6006](http://localhost:6006).
* `<trainer-config-file>` is the file path of the trainer configuration yaml. This contains all the
hyperparameter values. We offer a detailed guide on the structure of this file and the meaning
of the hyperameters (and advice on how to set them) in the dedicated
[Training Config File](#training-config-file) section below.
* `<env_name>`__(Optional)__ is the name (including path) of your [Unity
executable](Learning-Environment-Executable.md) containing the agents to be trained.
If `<env_name>` is not passed, the training will happen in the Editor.
Press the :arrow_forward: button in Unity when the message _"Start training by
pressing the Play button in the Unity Editor"_ is displayed on the screen.
* `<run-identifier>` is a unique name you can use to identify the results of your training runs.

See the [Getting Started Guide](Getting-Started.md#training-a-new-model-with-reinforcement-learning)
for a sample execution of the `mlagents-learn` command.

#### Observing Training

Regardless of which training methods, configurations or hyperparameters you provide,
the training process will always generate three artifacts:
1. Summaries (under the `summaries/` folder): these are training metrics that are updated
throughout the training process. They are helpful to monitor your training performance
and may help inform how to update your hyperparameter values.
See [Using TensorBoard](Using-Tensorboard.md) for more details on how to visualize
the training metrics.
1. Models (under the `models/` folder): these contain the model checkpoints that are updated
throughout training and the final model file (`.nn`). This final model file is generated once
either when training completes or is interrupted.
1. Timers file (also under the `summaries/` folder): this contains aggregated metrics on your
training process, including time spent on specific code blocks.
See [Profiling in Python](Profiling-Python.md) for more information on the timers generated.

These artifacts (except the `.nn` file) are updated throughout the training process and finalized
when training completes or is interrupted.

#### Debugging

**Note:** The default port TensorBoard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
option.
If you enable the `--debug` flag in the command line, the trainer metrics are logged to a CSV file
Copy link
Contributor

@ervteng ervteng Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately none of these things are correct anymore, I think we can just remove the whole Debugging section.

The CSV is still created under the summaries/ directory, but is slated to be deprecated and replaced with a JSON file, so it's probably OK if we don't mention it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we remove now and bring back once we add the JSON file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, remove the --debug option from the CLI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for remove now and bring back once we add the JSON file.

The --debug CLI option is still there, but it doesn't really help the user in debugging training (it's mostly for us), nor does it log the CSV. So it probably doesn't go in this doc. Is it OK if we just leave it to the --help menu to describe?

stored in the `summaries` directory. The metrics stored are:
* brain name
* time to update policy
* time since start of training
* time for last experience collection
* number of experiences used for training
* mean return

When training is finished, you can find the saved model in the `models` folder
under the assigned run-id — in the cats example, the path to the model would be
`models/cob_1/CatsOnBicycles_cob_1.nn`.
This option is not available currently for Behavioral Cloning.

While this example used the default training hyperparameters, you can edit the
[trainer_config.yaml file](#training-config-file) with a text editor to set
different values.
#### Stopping and Resuming Training

To interrupt training and save the current progress, hit Ctrl+C once and wait for the
model to be saved out.
To interrupt training and save the current progress, hit `Ctrl+C` once and wait for the
model(s) to be saved out.

### Loading an Existing Model
To resume a previously interrupted or completed training run, use the `--resume` flag and
make sure to specify the previously used run ID.

If you've quit training early using Ctrl+C, you can resume the training run by running
`mlagents-learn` again, specifying the same `<run-identifier>` and appending the `--resume` flag
to the command.
If you would like to re-run a previously interrupted or completed training run and re-use
the same run ID (in this case, overwriting the previously generated artifacts), then
use the `--force` flag.

You can also use this mode to run inference of an already-trained model in Python.
Append both the `--resume` and `--inference` to do this. Note that if you want to run
inference in Unity, you should use the
[Unity Inference Engine](Getting-started.md#running-a-pre-trained-model).
#### Loading an Existing Model

If you've already trained a model using the specified `<run-identifier>` and `--resume` is not
specified, you will not be able to continue with training. Use `--force` to force ML-Agents to
overwrite the existing data.
You can also use this mode to run inference of an already-trained model in Python by
using both the `--resume` and `--inference` flags. Note that if you want to run
inference in Unity, you should use the [Unity Inference Engine](Getting-Started.md#running-a-pre-trained-model).

Alternatively, you might want to start a new training run but _initialize_ it using an already-trained
model. You may want to do this, for instance, if your environment changed and you want
a new model, but the old behavior is still better than random. You can do this by specifying `--initialize-from=<run-identifier>`, where `<run-identifier>` is the old run ID.

### Command Line Training Options

In addition to passing the path of the Unity executable containing your training
environment, you can set the following command line options when invoking
`mlagents-learn`:

* `--env=<env>`: Specify an executable environment to train.
* `--curriculum=<file>`: Specify a curriculum JSON file for defining the
lessons for curriculum training. See [Curriculum
Training](Training-Curriculum-Learning.md) for more information.
* `--sampler=<file>`: Specify a sampler YAML file for defining the
sampler for parameter randomization. See [Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md) for more information.
* `--keep-checkpoints=<n>`: Specify the maximum number of model checkpoints to
keep. Checkpoints are saved after the number of steps specified by the
`save-freq` option. Once the maximum number of checkpoints has been reached,
the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
* `--lesson=<n>`: Specify which lesson to start with when performing curriculum
training. Defaults to 0.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
collect experiences from when training. Defaults to 1.
* `--run-id=<run-identifier>`: Specifies an identifier for each training run. This
identifier is used to name the subdirectories in which the trained model and
summary statistics are saved as well as the saved model itself. The default id
is "ppo". If you use TensorBoard to view the training statistics, always set a
unique run-id for each training run. (The statistics for all runs with the
same id are combined as if they were produced by a the same session.)
* `--save-freq=<n>`: Specifies how often (in steps) to save the model during
training. Defaults to 50000.
* `--seed=<n>`: Specifies a number to use as a seed for the random number
generator used by the training code.
* `--env-args=<string>`: Specify arguments for the executable environment. Be aware that
the standalone build will also process these as
[Unity Command Line Arguments](https://docs.unity3d.com/Manual/CommandLineArguments.html).
You should choose different argument names if you want to create environment-specific arguments.
All arguments after this flag will be passed to the executable. For example, setting
`mlagents-learn config/trainer_config.yaml --env-args --num-orcs 42` would result in
` --num-orcs 42` passed to the executable.
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
given to each instance from 0 to `num_envs - 1`. Default is 5005. __Note:__ When
training using the Editor rather than an executable, the base port will be ignored.
* `--inference`: Specifies whether to only run in inference mode. Omit to train the model.
To load an existing model, specify a run-id and combine with `--resume`.
* `--resume`: If set, the training code loads an already trained model to
initialize the neural network before training. The learning code looks for the
model in `models/<run-id>/` (which is also where it saves models at the end of
training). This option only works when the models exist, and have the same behavior names
as the current agents in your scene.
* `--force`: Attempting to train a model with a run-id that has been used before will
throw an error. Use `--force` to force-overwrite this run-id's summary and model data.
* `--initialize-from=<run-identifier>`: Specify an old run-id here to initialize your model from
a previously trained model. Note that the previously saved models _must_ have the same behavior
parameters as your current environment.
* `--no-graphics`: Specify this option to run the Unity executable in
`-batchmode` and doesn't initialize the graphics driver. Use this only if your
training doesn't involve visual observations (reading from Pixels). See
[here](https://docs.unity3d.com/Manual/CommandLineArguments.html) for more
details.
* `--debug`: Specify this option to enable debug-level logging for some parts of the code.
* `--cpu`: Forces training using CPU only.
* Engine Configuration :
* `--width` : The width of the executable window of the environment(s) in pixels
(ignored for editor training) (Default 84)
* `--height` : The height of the executable window of the environment(s) in pixels
(ignored for editor training). (Default 84)
* `--quality-level` : The quality level of the environment(s). Equivalent to
calling `QualitySettings.SetQualityLevel` in Unity. (Default 5)
* `--time-scale` : The time scale of the Unity environment(s). Equivalent to setting
`Time.timeScale` in Unity. (Default 20.0, maximum 100.0)
* `--target-frame-rate` : The target frame rate of the Unity environment(s).
Equivalent to setting `Application.targetFrameRate` in Unity. (Default: -1)

### Training Config File
## Training Config File

The Unity ML-Agents Toolkit provides a wide range of training scenarios, methods and options.
As such, specific training runs may require different training configurations and may
generate different artifacts and TensorBoard statistics. This section offers a detailed
guide into how to manage the different training set-ups withing the toolkit.

The training config files `config/trainer_config.yaml`, `config/sac_trainer_config.yaml`,
`config/gail_config.yaml` and `config/offline_bc_config.yaml` specifies the training method,
Expand All @@ -204,6 +124,8 @@ also add new sections to override these defaults to train specific Behaviors. Na
override sections after the appropriate `Behavior Name`. Sections for the
example environments are included in the provided config file.

\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation), GAIL = Generative Adversarial Imitation Learning

| **Setting** | **Description** | **Applies To Trainer\*** |
| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, SAC |
Expand Down Expand Up @@ -235,35 +157,18 @@ example environments are included in the provided config file.
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| init_path | Initialize trainer from a previously saved model. | PPO, SAC |

\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation), GAIL = Generative Adversarial Imitaiton Learning

For specific advice on setting hyperparameters based on the type of training you
are conducting, see:

* [Training with PPO](Training-PPO.md)
* [Training with SAC](Training-SAC.md)
* [Training with Self-Play](Training-Self-Play.md)
* [Using Recurrent Neural Networks](Feature-Memory.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)

You can also compare the
[example environments](Learning-Environment-Examples.md)
You can also compare the [example environments](Learning-Environment-Examples.md)
to the corresponding sections of the `config/trainer_config.yaml` file for each
example to see how the hyperparameters and other configuration variables have
been changed from the defaults.

### Debugging and Profiling
If you enable the `--debug` flag in the command line, the trainer metrics are logged to a CSV file
stored in the `summaries` directory. The metrics stored are:
* brain name
* time to update policy
* time since start of training
* time for last experience collection
* number of experiences used for training
* mean return

This option is not available currently for Behavioral Cloning.

Additionally, we have included basic [Profiling in Python](Profiling-Python.md) as part of the toolkit.
This information is also saved in the `summaries` directory.
Loading