Doc changes for making Academy non-virtual (#3195)

Chris Elion · web-flow · commit 61a954877524 · 2020-01-10T10:55:04.000-08:00
diff --git a/docs/Feature-Monitor.md b/docs/Feature-Monitor.md
@@ -9,13 +9,13 @@ You can track many different things both related and unrelated to the agents
 themselves. By default, the Monitor is only active in the *inference* phase, so
 not during training. To change this behavior, you can activate or deactivate it
 by calling `SetActive(boolean)`. For example to also show the monitor during
-training, you can call it in the `InitializeAcademy()` method of your `Academy`:
+training, you can call it in the `Awake()` method of your `MonoBehaviour`:
 
 ```csharp
 using MLAgents;
 
-public class YourAcademy : Academy {
-    public override void InitializeAcademy()
+public class MyBehaviour : MonoBehaviour {
+    public void Awake()
     {
         Monitor.SetActive(true);
     }
diff --git a/docs/Getting-Started-with-Balance-Ball.md b/docs/Getting-Started-with-Balance-Ball.md
@@ -50,19 +50,7 @@ to speed up training since all twelve agents contribute to training in parallel.
 
 ### Academy
 
-The Academy object for the scene is placed on the Ball3DAcademy GameObject. Since
-the base Academy class is abstract, you must always define a subclass. There are
-three functions you can implement, though they are all optional:
-
-* Academy.InitializeAcademy() — Called once when the environment is launched.
-* Academy.AcademyStep() — Called at every simulation step before
-  agent.AgentAction() (and after the Agents collect their observations).
-* Academy.AcademyReset() — Called when the Academy starts or restarts the
-  simulation (including the first time).
-
-The 3D Balance Ball environment does not use these functions — each Agent resets
-itself when needed — but many environments do use these functions to control the
-environment around the Agents.
+The Academy object for the scene is placed on the Ball3DAcademy GameObject.
 
 ### Agent
 
diff --git a/docs/Learning-Environment-Create-New.md b/docs/Learning-Environment-Create-New.md
@@ -17,10 +17,8 @@ steps:
 1. Create an environment for your agents to live in. An environment can range
     from a simple physical simulation containing a few objects to an entire game
     or ecosystem.
-2. Implement an Academy subclass and add it to a GameObject in the Unity scene
-    containing the environment. Your Academy class can implement a few optional
-    methods to update the scene independently of any agents. For example, you can
-    add, move, or delete agents and other entities in the environment.
+2. Add an Academy MonoBehaviour to a GameObject in the Unity scene
+    containing the environment.
 3. Implement your Agent subclasses. An Agent subclass defines the code an Agent
     uses to observe its environment, to carry out assigned actions, and to
     calculate the rewards used for reinforcement training. You can also implement
@@ -115,46 +113,16 @@ component later in the tutorial.
 You can adjust the camera angles to give a better view of the scene at runtime.
 The next steps will be to create and add the ML-Agent components.
 
-## Implement an Academy
-
+## Add an Academy
 The Academy object coordinates the ML-Agents in the scene and drives the
 decision-making portion of the simulation loop. Every ML-Agent scene needs one
-Academy instance. Since the base Academy class is abstract, you must make your
-own subclass even if you don't need to use any of the methods for a particular
-environment.
+(and only one) Academy instance.
 
-First, add a New Script component to the Academy GameObject created earlier:
+First, add an Academy component to the Academy GameObject created earlier:
 
 1. Select the Academy GameObject to view it in the Inspector window.
 2. Click **Add Component**.
-3. Click **New Script** in the list of components (at the bottom).
-4. Name the script "RollerAcademy".
-5. Click **Create and Add**.
-
-Next, edit the new `RollerAcademy` script:
-
-1. In the Unity Project window, double-click the `RollerAcademy` script to open
-    it in your code editor. (By default new scripts are placed directly in the
-    **Assets** folder.)
-2. In the code editor, add the statement, `using MLAgents;`.
-3. Change the base class from `MonoBehaviour` to `Academy`.
-4. Delete the `Start()` and `Update()` methods that were added by default.
-
-In such a basic scene, we don't need the Academy to initialize, reset, or
-otherwise control any objects in the environment so we have the simplest
-possible Academy implementation:
-
-```csharp
-using MLAgents;
-
-public class RollerAcademy : Academy { }
-```
-
-The default settings for the Academy properties are also fine for this
-environment, so we don't need to change anything for the RollerAcademy component
-in the Inspector window.
-
-![The Academy properties](images/mlagents-NewTutAcademy.png)
+3. Select **Academy** in the list of components.
 
 ## Implement an Agent
 
@@ -179,13 +147,6 @@ So far, these are the basic steps that you would use to add ML-Agents to any
 Unity project. Next, we will add the logic that will let our Agent learn to roll
 to the cube using reinforcement learning.
 
-In this simple scenario, we don't use the Academy object to control the
-environment. If we wanted to change the environment, for example change the size
-of the floor or add or remove agents or other objects before or during the
-simulation, we could implement the appropriate methods in the Academy. Instead,
-we will have the Agent do all the work of resetting itself and the target when
-it succeeds or falls trying.
-
 ### Initialization and Resetting the Agent
 
 When the Agent reaches its target, it marks itself done and its Agent reset
diff --git a/docs/Learning-Environment-Design-Academy.md b/docs/Learning-Environment-Design-Academy.md
diff --git a/docs/Learning-Environment-Design.md b/docs/Learning-Environment-Design.md
@@ -39,15 +39,14 @@ use.
 
 The ML-Agents Academy class orchestrates the agent simulation loop as follows:
 
-1. Calls your Academy subclass's `AcademyReset()` function.
+1. Calls your Academy's `OnEnvironmentReset` delegate.
 2. Calls the `AgentReset()` function for each Agent in the scene.
 3. Calls the  `CollectObservations()` function for each Agent in the scene.
 4. Uses each Agent's Policy to decide on the Agent's next action.
-5. Calls your subclass's `AcademyStep()` function.
-6. Calls the `AgentAction()` function for each Agent in the scene, passing in
+5. Calls the `AgentAction()` function for each Agent in the scene, passing in
    the action chosen by the Agent's Policy. (This function is not called if the
    Agent is done.)
-7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
+6. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
    Step` count or has otherwise marked itself as `done`. Optionally, you can set
    an Agent to restart if it finishes before the end of an episode. In this
    case, the Academy calls the `AgentReset()` function.
@@ -57,7 +56,7 @@ implement the above methods. The `Agent.CollectObservations()` and
 `Agent.AgentAction()` functions are required; the other methods are optional —
 whether you need to implement them or not depends on your specific scenario.
 
-**Note:** The API used by the Python PPO training process to communicate with
+**Note:** The API used by the Python training process to communicate with
 and control the Academy during training can be used for other purposes as well.
 For example, you could use the API to use Unity as the simulation engine for
 your own machine learning algorithms. See [Python API](Python-API.md) for more
@@ -66,32 +65,43 @@ information.
 ## Organizing the Unity Scene
 
 To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
-a single Academy subclass and as many Agent subclasses
-as you need.
+a single Academy and as many Agent subclasses as you need.
 Agent instances should be attached to the GameObject representing that Agent.
 
 ### Academy
 
 The Academy object orchestrates Agents and their decision making processes. Only
 place a single Academy object in a scene.
 
-You must create a subclass of the Academy class (since the base class is
-abstract). When you create your Academy subclass, you can implement the
-following methods (all are optional):
-
-* `InitializeAcademy()` — Prepare the environment the first time it launches.
-* `AcademyReset()` — Prepare the environment and Agents for the next training
-  episode. Use this function to place and initialize entities in the scene as
-  necessary.
-* `AcademyStep()` — Prepare the environment for the next simulation step. The
-  base Academy class calls this function before calling any `AgentAction()`
-  methods for the current step. You can use this function to update other
-  objects in the scene before the Agents take their actions. Note that the
-  Agents have already collected their observations and chosen an action before
-  the Academy invokes this method.
-
-See [Academy](Learning-Environment-Design-Academy.md) for a complete list of
-the Academy properties and their uses.
+#### Academy resetting
+To alter the environment at the start of each episode, add your method to the Academy's OnEnvironmentReset action.
+
+```csharp
+public class MySceneBehavior : MonoBehaviour
+{
+    public void Awake()
+    {
+        var academy = FindObjectOfType<Academy>();
+        academy.LazyInitialization();
+        academy.OnEnvironmentReset += EnvironmentReset;
+    }
+
+    void EnvironmentReset()
+    {
+        // Reset the scene here
+    }
+}
+```
+
+For example, you might want to reset an Agent to its starting
+position or move a goal to a random position. An environment resets when the
+`reset()` method is called on the Python `UnityEnvironment`.
+
+When you reset an environment, consider the factors that should change so that
+training is generalizable to different conditions. For example, if you were
+training a maze-solving agent, you would probably want to change the maze itself
+for each training episode. Otherwise, the agent would probably on learn to solve
+one, particular maze, not mazes in general.
 
 ### Agent
 
diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md
@@ -139,9 +139,7 @@ organize the Unity scene:
   receives and assigning a reward (positive / negative) when appropriate. Each
   Agent is linked to a Policy.
 - **Academy** - which orchestrates the observation and decision making process.
-  Within the Academy, several environment-wide parameters such as the rendering
-  quality and the speed at which the environment is run can be specified. The
-  External Communicator lives within the Academy.
+  The External Communicator lives within the Academy.
 
 Every Learning Environment will always have one global Academy and one Agent for
 every character in the scene. While each Agent must be linked to a Policy, it is
diff --git a/docs/Migrating.md b/docs/Migrating.md
@@ -10,10 +10,19 @@ The versions can be found in
 ## Migrating from 0.13 to latest
 
 ### Important changes
+* The Academy class was changed to be sealed and its virtual methods were removed.
 * Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard.
 
 ### Steps to Migrate
 * Multiply `max_steps` and `summary_steps` in your `trainer_config.yaml` by the number of Agents in the scene.
+* If you have a class that inherits from Academy:
+  * If the class didn't override any of the virtual methods and didn't store any additional data, you can just replace the instance of it in the scene with an Academy.
+  * If the class had additional data, create a new MonoBehaviour and store the data on this instead.
+  * If the class overrode the virtual methods, create a new MonoBehaviour and move the logic to it:
+    * Move the InitializeAcademy code to MonoBehaviour.OnAwake
+    * Move the AcademyStep code to MonoBehaviour.FixedUpdate
+    * Move the OnDestroy code to MonoBehaviour.OnDestroy or add it to the to Academy.DestroyAction action.
+    * Move the AcademyReset code to a new method and add it to the Academy.OnEnvironmentReset action.
 
 ## Migrating from ML-Agents toolkit v0.12.0 to v0.13.0
 
@@ -22,7 +31,8 @@ The versions can be found in
   * `reset()` on the Low-Level Python API no longer takes a `train_mode` argument. To modify the performance/speed of the engine, you must use an `EngineConfigurationChannel`
   * `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
 * `CustomResetParameters` are now removed.
-* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
+* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`.
+To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
 * The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
 * Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
 Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
@@ -46,7 +56,9 @@ Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Tra
 * Barracuda was upgraded to 0.3.2, and it is now installed via the Unity Package Manager.
 
 ### Steps to Migrate
-* We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in `RayPerception3d.Perceive()` that was causing the `endOffset` to be used incorrectly. However this may produce different behavior from previous versions if you use a non-zero `startOffset`. To reproduce the old behavior, you should increase the the value of `endOffset` by `startOffset`. You can verify your raycasts are performing as expected in scene view using the debug rays.
+* We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in `RayPerception3d.Perceive()` that was causing the `endOffset` to be used incorrectly. However this may produce different behavior from previous versions if you use a non-zero `startOffset`.
+To reproduce the old behavior, you should increase the the value of `endOffset` by `startOffset`.
+You can verify your raycasts are performing as expected in scene view using the debug rays.
 * If you use RayPerception3D, replace it with RayPerceptionSensorComponent3D (and similarly for 2D). The settings, such as ray angles and detectable tags, are configured on the component now.
 RayPerception3D would contribute `(# of rays) * (# of tags + 2)` to the State Size in Behavior Parameters, but this is no longer necessary, so you should reduce the State Size by this amount.
 Making this change will require retraining your model, since the observations that RayPerceptionSensorComponent3D produces are different from the old behavior.
@@ -68,7 +80,8 @@ Making this change will require retraining your model, since the observations th
 #### Steps to Migrate
 * In order to be able to train, make sure both your ML-Agents Python package and UnitySDK code come from the v0.11 release. Training will not work, for example, if you update the ML-Agents Python package, and only update the API Version in UnitySDK.
 * If your Agents used visual observations, you must add a CameraSensorComponent corresponding to each old Camera in the Agent's camera list (and similarly for RenderTextures).
-* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject. You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.
+* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject.
+You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.
 
 ## Migrating from ML-Agents toolkit v0.9 to v0.10
 
@@ -79,7 +92,9 @@ Making this change will require retraining your model, since the observations th
 #### Steps to Migrate
 * `UnitySDK/Assets/ML-Agents/Scripts/Communicator.cs` and its class `Communicator` have been renamed to `UnitySDK/Assets/ML-Agents/Scripts/ICommunicator.cs` and `ICommunicator` respectively.
 * The `SpaceType` Enums `discrete`, and `continuous` have been renamed to `Discrete` and `Continuous`.
-* We have removed the `Done` call as well as the capacity to set `Max Steps` on the Academy. Therefore an AcademyReset will never be triggered from C# (only from Python). If you want to reset the simulation after a fixed number of steps, or when an event in the simulation occurs, we recommend looking at our multi-agent example environments (such as BananaCollector). In our examples, groups of Agents can be reset through an "Area" that can reset groups of Agents.
+* We have removed the `Done` call as well as the capacity to set `Max Steps` on the Academy. Therefore an AcademyReset will never be triggered from C# (only from Python). If you want to reset the simulation after a
+fixed number of steps, or when an event in the simulation occurs, we recommend looking at our multi-agent example environments (such as BananaCollector).
+In our examples, groups of Agents can be reset through an "Area" that can reset groups of Agents.
 * The import for `mlagents.envs.UnityEnvironment` was removed. If you are using the Python API, change `from mlagents_envs import UnityEnvironment` to `from mlagents_envs.environment import UnityEnvironment`.
 
 
diff --git a/docs/Readme.md b/docs/Readme.md
diff --git a/docs/images/mlagents-NewTutAcademy.png b/docs/images/mlagents-NewTutAcademy.png