Academy singleton docs (#3218)

Chris Elion · web-flow · commit db97bf66cd56 · 2020-01-14T14:39:21.000-08:00
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -94,7 +94,7 @@ UnityEnvironment(file_name=filename, worker_id=X)
 
 If you receive a message `Mean reward : nan` when attempting to train a model
 using PPO, this is due to the episodes of the Learning Environment not
-terminating. In order to address this, set `Max Steps` for either the Academy or
+terminating. In order to address this, set `Max Steps` for the
 Agents within the Scene Inspector to a value greater than 0. Alternatively, it
 is possible to manually set `done` conditions for episodes from within scripts
 for custom episode-terminating events.
diff --git a/docs/Getting-Started-with-Balance-Ball.md b/docs/Getting-Started-with-Balance-Ball.md
@@ -48,9 +48,6 @@ it contains not one, but several agent cubes.  Each agent cube in the scene is a
 independent agent, but they all share the same Behavior. 3D Balance Ball does this
 to speed up training since all twelve agents contribute to training in parallel.
 
-### Academy
-
-The Academy object for the scene is placed on the Ball3DAcademy GameObject.
 
 ### Agent
 
diff --git a/docs/Glossary.md b/docs/Glossary.md
@@ -1,6 +1,6 @@
 # ML-Agents Toolkit Glossary
 
-* **Academy** - Unity Component which controls timing, reset, and
+* **Academy** - Singleton object which controls timing, reset, and
   training/inference settings of the environment.
 * **Action** - The carrying-out of a decision on the part of an agent within the
   environment.
@@ -12,7 +12,7 @@
   carried out given an observation.
 * **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy,
   Scene, Inspector).
-* **Environment** - The Unity scene which contains Agents and the Academy.
+* **Environment** - The Unity scene which contains Agents.
 * **FixedUpdate** - Unity method called each time the game engine is
   stepped. ML-Agents logic should be placed here.
 * **Frame** - An instance of rendering the main camera for the display.
diff --git a/docs/Learning-Environment-Create-New.md b/docs/Learning-Environment-Create-New.md
@@ -17,13 +17,11 @@ steps:
 1. Create an environment for your agents to live in. An environment can range
     from a simple physical simulation containing a few objects to an entire game
     or ecosystem.
-2. Add an Academy MonoBehaviour to a GameObject in the Unity scene
-    containing the environment.
-3. Implement your Agent subclasses. An Agent subclass defines the code an Agent
+2. Implement your Agent subclasses. An Agent subclass defines the code an Agent
     uses to observe its environment, to carry out assigned actions, and to
     calculate the rewards used for reinforcement training. You can also implement
     optional methods to reset the Agent when it has finished or failed its task.
-4. Add your Agent subclasses to appropriate GameObjects, typically, the object
+3. Add your Agent subclasses to appropriate GameObjects, typically, the object
     in the scene that represents the Agent in the simulation.
 
 **Note:** If you are unfamiliar with Unity, refer to
@@ -103,27 +101,6 @@ different material from the list of all materials currently in the project.)
 Note that we will create an Agent subclass to add to this GameObject as a
 component later in the tutorial.
 
-### Add an Empty GameObject to Hold the Academy
-
-1. Right click in Hierarchy window, select Create Empty.
-2. Name the GameObject "Academy"
-
-![The scene hierarchy](images/mlagents-NewTutHierarchy.png)
-
-You can adjust the camera angles to give a better view of the scene at runtime.
-The next steps will be to create and add the ML-Agent components.
-
-## Add an Academy
-The Academy object coordinates the ML-Agents in the scene and drives the
-decision-making portion of the simulation loop. Every ML-Agent scene needs one
-(and only one) Academy instance.
-
-First, add an Academy component to the Academy GameObject created earlier:
-
-1. Select the Academy GameObject to view it in the Inspector window.
-2. Click **Add Component**.
-3. Select **Academy** in the list of components.
-
 ## Implement an Agent
 
 To create the Agent:
@@ -524,7 +501,6 @@ to use Unity ML-Agents: an Academy and one or more Agents.
 
 Keep in mind:
 
-* There can only be one Academy game object in a scene.
 * If you are using multiple training areas, make sure all the Agents have the same `Behavior Name`
 and `Behavior Parameters`
 
diff --git a/docs/Learning-Environment-Design.md b/docs/Learning-Environment-Design.md
@@ -51,7 +51,7 @@ The ML-Agents Academy class orchestrates the agent simulation loop as follows:
    an Agent to restart if it finishes before the end of an episode. In this
    case, the Academy calls the `AgentReset()` function.
 
-To create a training environment, extend the Academy and Agent classes to
+To create a training environment, extend the Agent class to
 implement the above methods. The `Agent.CollectObservations()` and
 `Agent.AgentAction()` functions are required; the other methods are optional —
 whether you need to implement them or not depends on your specific scenario.
@@ -64,14 +64,13 @@ information.
 
 ## Organizing the Unity Scene
 
-To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
-a single Academy and as many Agent subclasses as you need.
+To train and use the ML-Agents toolkit in a Unity scene, the scene as many Agent subclasses as you need.
 Agent instances should be attached to the GameObject representing that Agent.
 
 ### Academy
 
-The Academy object orchestrates Agents and their decision making processes. Only
-place a single Academy object in a scene.
+The Academy is a singleton which orchestrates Agents and their decision making processes. Only
+a single Academy exists at a time.
 
 #### Academy resetting
 To alter the environment at the start of each episode, add your method to the Academy's OnEnvironmentReset action.
@@ -81,9 +80,7 @@ public class MySceneBehavior : MonoBehaviour
 {
     public void Awake()
     {
-        var academy = FindObjectOfType<Academy>();
-        academy.LazyInitialization();
-        academy.OnEnvironmentReset += EnvironmentReset;
+        Academy.Instance.OnEnvironmentReset += EnvironmentReset;
     }
 
     void EnvironmentReset()
@@ -144,8 +141,6 @@ training and for testing trained agents. Or, you may be training agents to
 operate in a complex game or simulation. In this case, it might be more
 efficient and practical to create a purpose-built training scene.
 
-Both training and testing (or normal game) scenes must contain an Academy object
-to control the agent decision making process.
 When you create a training environment in Unity, you must set up the scene so
 that it can be controlled by the external training process. Considerations
 include:
diff --git a/docs/Limitations.md b/docs/Limitations.md
@@ -16,12 +16,13 @@ making. See
 [Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
 for more information.
 
+You can control the frequency of Academy stepping by calling
+`Academy.Instance.DisableAutomaticStepping()`, and then calling
+`Academy.Instance.EnvironmentStep()`
+
 ## Python API
 
 ### Python version
 
 As of version 0.3, we no longer support Python 2.
 
-### TensorFlow support
-
-Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 only.
diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md
@@ -131,17 +131,15 @@ components:
 
 _Simplified block diagram of ML-Agents._
 
-The Learning Environment contains two additional components that help
+The Learning Environment contains an additional component that help
 organize the Unity scene:
 
 - **Agents** - which is attached to a Unity GameObject (any character within a
   scene) and handles generating its observations, performing the actions it
   receives and assigning a reward (positive / negative) when appropriate. Each
   Agent is linked to a Policy.
-- **Academy** - which orchestrates the observation and decision making process.
-  The External Communicator lives within the Academy.
 
-Every Learning Environment will always have one global Academy and one Agent for
+Every Learning Environment will always have one Agent for
 every character in the scene. While each Agent must be linked to a Policy, it is
 possible for Agents that have similar observations and actions to have
 the same Policy type. In our sample game, we have two teams each with their own medic.
diff --git a/docs/Migrating.md b/docs/Migrating.md
@@ -11,14 +11,14 @@ The versions can be found in
 ## Migrating from 0.13 to latest
 
 ### Important changes
-* The Academy class was changed to be sealed and its virtual methods were removed.
+* The Academy class was changed to a singleton, and its virtual methods were removed.
 * Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard.
 * Curriculum config files are now YAML formatted and all curricula for a training run are combined into a single file.
 * The `--num-runs` command-line option has been removed.
 
 ### Steps to Migrate
 * If you have a class that inherits from Academy:
-  * If the class didn't override any of the virtual methods and didn't store any additional data, you can just replace the instance of it in the scene with an Academy.
+  * If the class didn't override any of the virtual methods and didn't store any additional data, you can just remove the old script from the scene.
   * If the class had additional data, create a new MonoBehaviour and store the data on this instead.
   * If the class overrode the virtual methods, create a new MonoBehaviour and move the logic to it:
     * Move the InitializeAcademy code to MonoBehaviour.OnAwake
diff --git a/docs/Python-API.md b/docs/Python-API.md
@@ -263,7 +263,6 @@ i = env.reset()
 Once a property has been modified in Python, you can access it in C# after the next call to `step` as follows:
 
 ```csharp
-var academy = FindObjectOfType<Academy>();
-var sharedProperties = academy.FloatProperties;
+var sharedProperties = Academy.Instance.FloatProperties;
 float property1 = sharedProperties.GetPropertyWithDefault("parameter_1", 0.0f);
 ```
diff --git a/docs/Training-Curriculum-Learning.md b/docs/Training-Curriculum-Learning.md
@@ -41,8 +41,8 @@ the same environment.
 In order to define the curricula, the first step is to decide which parameters of
 the environment will vary. In the case of the Wall Jump environment,
 the height of the wall is what varies. We define this as a `Shared Float Property`
-that can be accessed in `Academy.FloatProperties`, and by doing so it becomes
-adjustable via the Python API.
+that can be accessed in `Academy.Instance.FloatProperties`, and by doing
+so it becomes adjustable via the Python API.
 Rather than adjusting it by hand, we will create a YAML file which
 describes the structure of the curricula. Within it, we can specify which
 points in the training process our wall height will change, either based on the
diff --git a/docs/Training-Generalized-Reinforcement-Learning-Agents.md b/docs/Training-Generalized-Reinforcement-Learning-Agents.md
@@ -21,7 +21,7 @@ Ball scale of 0.5          |  Ball scale of 4
 ## Introducing Generalization Using Reset Parameters
 
 To enable variations in the environments, we implemented `Reset Parameters`.
-`Reset Parameters` are `Academy.FloatProperties` that are used only when
+`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when
 resetting the environment. We
 also included different sampling methods and the ability to create new kinds of
 sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed
diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md
@@ -2,7 +2,7 @@
 
 The ML-Agents toolkit conducts training using an external Python training
 process. During training, this external process communicates with the Academy
-object in the Unity scene to generate a block of agent experiences. These
+to generate a block of agent experiences. These
 experiences become the training set for a neural network used to optimize the
 agent's policy (which is essentially a mathematical function mapping
 observations to actions). In reinforcement learning, the neural network
diff --git a/docs/images/mlagents-3DBallHierarchy.png b/docs/images/mlagents-3DBallHierarchy.png
diff --git a/docs/images/mlagents-NewTutHierarchy.png b/docs/images/mlagents-NewTutHierarchy.png
diff --git a/docs/images/mlagents-Open3DBall.png b/docs/images/mlagents-Open3DBall.png