Skip to content

Commit 61a9548

Browse files
author
Chris Elion
authored
Doc changes for making Academy non-virtual (#3195)
1 parent 839794a commit 61a9548

9 files changed

+65
-143
lines changed

docs/Feature-Monitor.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ You can track many different things both related and unrelated to the agents
99
themselves. By default, the Monitor is only active in the *inference* phase, so
1010
not during training. To change this behavior, you can activate or deactivate it
1111
by calling `SetActive(boolean)`. For example to also show the monitor during
12-
training, you can call it in the `InitializeAcademy()` method of your `Academy`:
12+
training, you can call it in the `Awake()` method of your `MonoBehaviour`:
1313

1414
```csharp
1515
using MLAgents;
1616

17-
public class YourAcademy : Academy {
18-
public override void InitializeAcademy()
17+
public class MyBehaviour : MonoBehaviour {
18+
public void Awake()
1919
{
2020
Monitor.SetActive(true);
2121
}

docs/Getting-Started-with-Balance-Ball.md

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -50,19 +50,7 @@ to speed up training since all twelve agents contribute to training in parallel.
5050

5151
### Academy
5252

53-
The Academy object for the scene is placed on the Ball3DAcademy GameObject. Since
54-
the base Academy class is abstract, you must always define a subclass. There are
55-
three functions you can implement, though they are all optional:
56-
57-
* Academy.InitializeAcademy() — Called once when the environment is launched.
58-
* Academy.AcademyStep() — Called at every simulation step before
59-
agent.AgentAction() (and after the Agents collect their observations).
60-
* Academy.AcademyReset() — Called when the Academy starts or restarts the
61-
simulation (including the first time).
62-
63-
The 3D Balance Ball environment does not use these functions — each Agent resets
64-
itself when needed — but many environments do use these functions to control the
65-
environment around the Agents.
53+
The Academy object for the scene is placed on the Ball3DAcademy GameObject.
6654

6755
### Agent
6856

docs/Learning-Environment-Create-New.md

Lines changed: 6 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,8 @@ steps:
1717
1. Create an environment for your agents to live in. An environment can range
1818
from a simple physical simulation containing a few objects to an entire game
1919
or ecosystem.
20-
2. Implement an Academy subclass and add it to a GameObject in the Unity scene
21-
containing the environment. Your Academy class can implement a few optional
22-
methods to update the scene independently of any agents. For example, you can
23-
add, move, or delete agents and other entities in the environment.
20+
2. Add an Academy MonoBehaviour to a GameObject in the Unity scene
21+
containing the environment.
2422
3. Implement your Agent subclasses. An Agent subclass defines the code an Agent
2523
uses to observe its environment, to carry out assigned actions, and to
2624
calculate the rewards used for reinforcement training. You can also implement
@@ -115,46 +113,16 @@ component later in the tutorial.
115113
You can adjust the camera angles to give a better view of the scene at runtime.
116114
The next steps will be to create and add the ML-Agent components.
117115

118-
## Implement an Academy
119-
116+
## Add an Academy
120117
The Academy object coordinates the ML-Agents in the scene and drives the
121118
decision-making portion of the simulation loop. Every ML-Agent scene needs one
122-
Academy instance. Since the base Academy class is abstract, you must make your
123-
own subclass even if you don't need to use any of the methods for a particular
124-
environment.
119+
(and only one) Academy instance.
125120

126-
First, add a New Script component to the Academy GameObject created earlier:
121+
First, add an Academy component to the Academy GameObject created earlier:
127122

128123
1. Select the Academy GameObject to view it in the Inspector window.
129124
2. Click **Add Component**.
130-
3. Click **New Script** in the list of components (at the bottom).
131-
4. Name the script "RollerAcademy".
132-
5. Click **Create and Add**.
133-
134-
Next, edit the new `RollerAcademy` script:
135-
136-
1. In the Unity Project window, double-click the `RollerAcademy` script to open
137-
it in your code editor. (By default new scripts are placed directly in the
138-
**Assets** folder.)
139-
2. In the code editor, add the statement, `using MLAgents;`.
140-
3. Change the base class from `MonoBehaviour` to `Academy`.
141-
4. Delete the `Start()` and `Update()` methods that were added by default.
142-
143-
In such a basic scene, we don't need the Academy to initialize, reset, or
144-
otherwise control any objects in the environment so we have the simplest
145-
possible Academy implementation:
146-
147-
```csharp
148-
using MLAgents;
149-
150-
public class RollerAcademy : Academy { }
151-
```
152-
153-
The default settings for the Academy properties are also fine for this
154-
environment, so we don't need to change anything for the RollerAcademy component
155-
in the Inspector window.
156-
157-
![The Academy properties](images/mlagents-NewTutAcademy.png)
125+
3. Select **Academy** in the list of components.
158126

159127
## Implement an Agent
160128

@@ -179,13 +147,6 @@ So far, these are the basic steps that you would use to add ML-Agents to any
179147
Unity project. Next, we will add the logic that will let our Agent learn to roll
180148
to the cube using reinforcement learning.
181149

182-
In this simple scenario, we don't use the Academy object to control the
183-
environment. If we wanted to change the environment, for example change the size
184-
of the floor or add or remove agents or other objects before or during the
185-
simulation, we could implement the appropriate methods in the Academy. Instead,
186-
we will have the Agent do all the work of resetting itself and the target when
187-
it succeeds or falls trying.
188-
189150
### Initialization and Resetting the Agent
190151

191152
When the Agent reaches its target, it marks itself done and its Agent reset

docs/Learning-Environment-Design-Academy.md

Lines changed: 0 additions & 49 deletions
This file was deleted.

docs/Learning-Environment-Design.md

Lines changed: 34 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,14 @@ use.
3939

4040
The ML-Agents Academy class orchestrates the agent simulation loop as follows:
4141

42-
1. Calls your Academy subclass's `AcademyReset()` function.
42+
1. Calls your Academy's `OnEnvironmentReset` delegate.
4343
2. Calls the `AgentReset()` function for each Agent in the scene.
4444
3. Calls the `CollectObservations()` function for each Agent in the scene.
4545
4. Uses each Agent's Policy to decide on the Agent's next action.
46-
5. Calls your subclass's `AcademyStep()` function.
47-
6. Calls the `AgentAction()` function for each Agent in the scene, passing in
46+
5. Calls the `AgentAction()` function for each Agent in the scene, passing in
4847
the action chosen by the Agent's Policy. (This function is not called if the
4948
Agent is done.)
50-
7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
49+
6. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
5150
Step` count or has otherwise marked itself as `done`. Optionally, you can set
5251
an Agent to restart if it finishes before the end of an episode. In this
5352
case, the Academy calls the `AgentReset()` function.
@@ -57,7 +56,7 @@ implement the above methods. The `Agent.CollectObservations()` and
5756
`Agent.AgentAction()` functions are required; the other methods are optional —
5857
whether you need to implement them or not depends on your specific scenario.
5958

60-
**Note:** The API used by the Python PPO training process to communicate with
59+
**Note:** The API used by the Python training process to communicate with
6160
and control the Academy during training can be used for other purposes as well.
6261
For example, you could use the API to use Unity as the simulation engine for
6362
your own machine learning algorithms. See [Python API](Python-API.md) for more
@@ -66,32 +65,43 @@ information.
6665
## Organizing the Unity Scene
6766

6867
To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
69-
a single Academy subclass and as many Agent subclasses
70-
as you need.
68+
a single Academy and as many Agent subclasses as you need.
7169
Agent instances should be attached to the GameObject representing that Agent.
7270

7371
### Academy
7472

7573
The Academy object orchestrates Agents and their decision making processes. Only
7674
place a single Academy object in a scene.
7775

78-
You must create a subclass of the Academy class (since the base class is
79-
abstract). When you create your Academy subclass, you can implement the
80-
following methods (all are optional):
81-
82-
* `InitializeAcademy()` — Prepare the environment the first time it launches.
83-
* `AcademyReset()` — Prepare the environment and Agents for the next training
84-
episode. Use this function to place and initialize entities in the scene as
85-
necessary.
86-
* `AcademyStep()` — Prepare the environment for the next simulation step. The
87-
base Academy class calls this function before calling any `AgentAction()`
88-
methods for the current step. You can use this function to update other
89-
objects in the scene before the Agents take their actions. Note that the
90-
Agents have already collected their observations and chosen an action before
91-
the Academy invokes this method.
92-
93-
See [Academy](Learning-Environment-Design-Academy.md) for a complete list of
94-
the Academy properties and their uses.
76+
#### Academy resetting
77+
To alter the environment at the start of each episode, add your method to the Academy's OnEnvironmentReset action.
78+
79+
```csharp
80+
public class MySceneBehavior : MonoBehaviour
81+
{
82+
public void Awake()
83+
{
84+
var academy = FindObjectOfType<Academy>();
85+
academy.LazyInitialization();
86+
academy.OnEnvironmentReset += EnvironmentReset;
87+
}
88+
89+
void EnvironmentReset()
90+
{
91+
// Reset the scene here
92+
}
93+
}
94+
```
95+
96+
For example, you might want to reset an Agent to its starting
97+
position or move a goal to a random position. An environment resets when the
98+
`reset()` method is called on the Python `UnityEnvironment`.
99+
100+
When you reset an environment, consider the factors that should change so that
101+
training is generalizable to different conditions. For example, if you were
102+
training a maze-solving agent, you would probably want to change the maze itself
103+
for each training episode. Otherwise, the agent would probably on learn to solve
104+
one, particular maze, not mazes in general.
95105

96106
### Agent
97107

docs/ML-Agents-Overview.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,9 +139,7 @@ organize the Unity scene:
139139
receives and assigning a reward (positive / negative) when appropriate. Each
140140
Agent is linked to a Policy.
141141
- **Academy** - which orchestrates the observation and decision making process.
142-
Within the Academy, several environment-wide parameters such as the rendering
143-
quality and the speed at which the environment is run can be specified. The
144-
External Communicator lives within the Academy.
142+
The External Communicator lives within the Academy.
145143

146144
Every Learning Environment will always have one global Academy and one Agent for
147145
every character in the scene. While each Agent must be linked to a Policy, it is

docs/Migrating.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,19 @@ The versions can be found in
1010
## Migrating from 0.13 to latest
1111

1212
### Important changes
13+
* The Academy class was changed to be sealed and its virtual methods were removed.
1314
* Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard.
1415

1516
### Steps to Migrate
1617
* Multiply `max_steps` and `summary_steps` in your `trainer_config.yaml` by the number of Agents in the scene.
18+
* If you have a class that inherits from Academy:
19+
* If the class didn't override any of the virtual methods and didn't store any additional data, you can just replace the instance of it in the scene with an Academy.
20+
* If the class had additional data, create a new MonoBehaviour and store the data on this instead.
21+
* If the class overrode the virtual methods, create a new MonoBehaviour and move the logic to it:
22+
* Move the InitializeAcademy code to MonoBehaviour.OnAwake
23+
* Move the AcademyStep code to MonoBehaviour.FixedUpdate
24+
* Move the OnDestroy code to MonoBehaviour.OnDestroy or add it to the to Academy.DestroyAction action.
25+
* Move the AcademyReset code to a new method and add it to the Academy.OnEnvironmentReset action.
1726

1827
## Migrating from ML-Agents toolkit v0.12.0 to v0.13.0
1928

@@ -22,7 +31,8 @@ The versions can be found in
2231
* `reset()` on the Low-Level Python API no longer takes a `train_mode` argument. To modify the performance/speed of the engine, you must use an `EngineConfigurationChannel`
2332
* `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
2433
* `CustomResetParameters` are now removed.
25-
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
34+
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`.
35+
To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
2636
* The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
2737
* Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
2838
Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
@@ -46,7 +56,9 @@ Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Tra
4656
* Barracuda was upgraded to 0.3.2, and it is now installed via the Unity Package Manager.
4757

4858
### Steps to Migrate
49-
* We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in `RayPerception3d.Perceive()` that was causing the `endOffset` to be used incorrectly. However this may produce different behavior from previous versions if you use a non-zero `startOffset`. To reproduce the old behavior, you should increase the the value of `endOffset` by `startOffset`. You can verify your raycasts are performing as expected in scene view using the debug rays.
59+
* We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in `RayPerception3d.Perceive()` that was causing the `endOffset` to be used incorrectly. However this may produce different behavior from previous versions if you use a non-zero `startOffset`.
60+
To reproduce the old behavior, you should increase the the value of `endOffset` by `startOffset`.
61+
You can verify your raycasts are performing as expected in scene view using the debug rays.
5062
* If you use RayPerception3D, replace it with RayPerceptionSensorComponent3D (and similarly for 2D). The settings, such as ray angles and detectable tags, are configured on the component now.
5163
RayPerception3D would contribute `(# of rays) * (# of tags + 2)` to the State Size in Behavior Parameters, but this is no longer necessary, so you should reduce the State Size by this amount.
5264
Making this change will require retraining your model, since the observations that RayPerceptionSensorComponent3D produces are different from the old behavior.
@@ -68,7 +80,8 @@ Making this change will require retraining your model, since the observations th
6880
#### Steps to Migrate
6981
* In order to be able to train, make sure both your ML-Agents Python package and UnitySDK code come from the v0.11 release. Training will not work, for example, if you update the ML-Agents Python package, and only update the API Version in UnitySDK.
7082
* If your Agents used visual observations, you must add a CameraSensorComponent corresponding to each old Camera in the Agent's camera list (and similarly for RenderTextures).
71-
* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject. You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.
83+
* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject.
84+
You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.
7285

7386
## Migrating from ML-Agents toolkit v0.9 to v0.10
7487

@@ -79,7 +92,9 @@ Making this change will require retraining your model, since the observations th
7992
#### Steps to Migrate
8093
* `UnitySDK/Assets/ML-Agents/Scripts/Communicator.cs` and its class `Communicator` have been renamed to `UnitySDK/Assets/ML-Agents/Scripts/ICommunicator.cs` and `ICommunicator` respectively.
8194
* The `SpaceType` Enums `discrete`, and `continuous` have been renamed to `Discrete` and `Continuous`.
82-
* We have removed the `Done` call as well as the capacity to set `Max Steps` on the Academy. Therefore an AcademyReset will never be triggered from C# (only from Python). If you want to reset the simulation after a fixed number of steps, or when an event in the simulation occurs, we recommend looking at our multi-agent example environments (such as BananaCollector). In our examples, groups of Agents can be reset through an "Area" that can reset groups of Agents.
95+
* We have removed the `Done` call as well as the capacity to set `Max Steps` on the Academy. Therefore an AcademyReset will never be triggered from C# (only from Python). If you want to reset the simulation after a
96+
fixed number of steps, or when an event in the simulation occurs, we recommend looking at our multi-agent example environments (such as BananaCollector).
97+
In our examples, groups of Agents can be reset through an "Area" that can reset groups of Agents.
8398
* The import for `mlagents.envs.UnityEnvironment` was removed. If you are using the Python API, change `from mlagents_envs import UnityEnvironment` to `from mlagents_envs.environment import UnityEnvironment`.
8499

85100

0 commit comments

Comments
 (0)