diff --git a/docs/FAQ.md b/docs/FAQ.md index 74f3e90506..c5e24564e6 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -94,7 +94,7 @@ UnityEnvironment(file_name=filename, worker_id=X) If you receive a message `Mean reward : nan` when attempting to train a model using PPO, this is due to the episodes of the Learning Environment not -terminating. In order to address this, set `Max Steps` for either the Academy or +terminating. In order to address this, set `Max Steps` for the Agents within the Scene Inspector to a value greater than 0. Alternatively, it is possible to manually set `done` conditions for episodes from within scripts for custom episode-terminating events. diff --git a/docs/Getting-Started-with-Balance-Ball.md b/docs/Getting-Started-with-Balance-Ball.md index ad7b53a19e..aba4b8b870 100644 --- a/docs/Getting-Started-with-Balance-Ball.md +++ b/docs/Getting-Started-with-Balance-Ball.md @@ -48,9 +48,6 @@ it contains not one, but several agent cubes. Each agent cube in the scene is a independent agent, but they all share the same Behavior. 3D Balance Ball does this to speed up training since all twelve agents contribute to training in parallel. -### Academy - -The Academy object for the scene is placed on the Ball3DAcademy GameObject. ### Agent diff --git a/docs/Glossary.md b/docs/Glossary.md index 7db920ee51..171a26286a 100644 --- a/docs/Glossary.md +++ b/docs/Glossary.md @@ -1,6 +1,6 @@ # ML-Agents Toolkit Glossary -* **Academy** - Unity Component which controls timing, reset, and +* **Academy** - Singleton object which controls timing, reset, and training/inference settings of the environment. * **Action** - The carrying-out of a decision on the part of an agent within the environment. @@ -12,7 +12,7 @@ carried out given an observation. * **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy, Scene, Inspector). -* **Environment** - The Unity scene which contains Agents and the Academy. +* **Environment** - The Unity scene which contains Agents. * **FixedUpdate** - Unity method called each time the game engine is stepped. ML-Agents logic should be placed here. * **Frame** - An instance of rendering the main camera for the display. diff --git a/docs/Learning-Environment-Create-New.md b/docs/Learning-Environment-Create-New.md index 9beb60a988..eef62df705 100644 --- a/docs/Learning-Environment-Create-New.md +++ b/docs/Learning-Environment-Create-New.md @@ -17,13 +17,11 @@ steps: 1. Create an environment for your agents to live in. An environment can range from a simple physical simulation containing a few objects to an entire game or ecosystem. -2. Add an Academy MonoBehaviour to a GameObject in the Unity scene - containing the environment. -3. Implement your Agent subclasses. An Agent subclass defines the code an Agent +2. Implement your Agent subclasses. An Agent subclass defines the code an Agent uses to observe its environment, to carry out assigned actions, and to calculate the rewards used for reinforcement training. You can also implement optional methods to reset the Agent when it has finished or failed its task. -4. Add your Agent subclasses to appropriate GameObjects, typically, the object +3. Add your Agent subclasses to appropriate GameObjects, typically, the object in the scene that represents the Agent in the simulation. **Note:** If you are unfamiliar with Unity, refer to @@ -103,27 +101,6 @@ different material from the list of all materials currently in the project.) Note that we will create an Agent subclass to add to this GameObject as a component later in the tutorial. -### Add an Empty GameObject to Hold the Academy - -1. Right click in Hierarchy window, select Create Empty. -2. Name the GameObject "Academy" - -![The scene hierarchy](images/mlagents-NewTutHierarchy.png) - -You can adjust the camera angles to give a better view of the scene at runtime. -The next steps will be to create and add the ML-Agent components. - -## Add an Academy -The Academy object coordinates the ML-Agents in the scene and drives the -decision-making portion of the simulation loop. Every ML-Agent scene needs one -(and only one) Academy instance. - -First, add an Academy component to the Academy GameObject created earlier: - -1. Select the Academy GameObject to view it in the Inspector window. -2. Click **Add Component**. -3. Select **Academy** in the list of components. - ## Implement an Agent To create the Agent: @@ -524,7 +501,6 @@ to use Unity ML-Agents: an Academy and one or more Agents. Keep in mind: -* There can only be one Academy game object in a scene. * If you are using multiple training areas, make sure all the Agents have the same `Behavior Name` and `Behavior Parameters` diff --git a/docs/Learning-Environment-Design.md b/docs/Learning-Environment-Design.md index 2e7e7fd128..a810677fd4 100644 --- a/docs/Learning-Environment-Design.md +++ b/docs/Learning-Environment-Design.md @@ -51,7 +51,7 @@ The ML-Agents Academy class orchestrates the agent simulation loop as follows: an Agent to restart if it finishes before the end of an episode. In this case, the Academy calls the `AgentReset()` function. -To create a training environment, extend the Academy and Agent classes to +To create a training environment, extend the Agent class to implement the above methods. The `Agent.CollectObservations()` and `Agent.AgentAction()` functions are required; the other methods are optional — whether you need to implement them or not depends on your specific scenario. @@ -64,14 +64,13 @@ information. ## Organizing the Unity Scene -To train and use the ML-Agents toolkit in a Unity scene, the scene must contain -a single Academy and as many Agent subclasses as you need. +To train and use the ML-Agents toolkit in a Unity scene, the scene as many Agent subclasses as you need. Agent instances should be attached to the GameObject representing that Agent. ### Academy -The Academy object orchestrates Agents and their decision making processes. Only -place a single Academy object in a scene. +The Academy is a singleton which orchestrates Agents and their decision making processes. Only +a single Academy exists at a time. #### Academy resetting To alter the environment at the start of each episode, add your method to the Academy's OnEnvironmentReset action. @@ -81,9 +80,7 @@ public class MySceneBehavior : MonoBehaviour { public void Awake() { - var academy = FindObjectOfType(); - academy.LazyInitialization(); - academy.OnEnvironmentReset += EnvironmentReset; + Academy.Instance.OnEnvironmentReset += EnvironmentReset; } void EnvironmentReset() @@ -144,8 +141,6 @@ training and for testing trained agents. Or, you may be training agents to operate in a complex game or simulation. In this case, it might be more efficient and practical to create a purpose-built training scene. -Both training and testing (or normal game) scenes must contain an Academy object -to control the agent decision making process. When you create a training environment in Unity, you must set up the scene so that it can be controlled by the external training process. Considerations include: diff --git a/docs/Limitations.md b/docs/Limitations.md index 9a756486b0..ee6675e480 100644 --- a/docs/Limitations.md +++ b/docs/Limitations.md @@ -16,12 +16,13 @@ making. See [Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html) for more information. +You can control the frequency of Academy stepping by calling +`Academy.Instance.DisableAutomaticStepping()`, and then calling +`Academy.Instance.EnvironmentStep()` + ## Python API ### Python version As of version 0.3, we no longer support Python 2. -### TensorFlow support - -Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 only. diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md index f980a76823..b16c50cea4 100644 --- a/docs/ML-Agents-Overview.md +++ b/docs/ML-Agents-Overview.md @@ -131,17 +131,15 @@ components: _Simplified block diagram of ML-Agents._ -The Learning Environment contains two additional components that help +The Learning Environment contains an additional component that help organize the Unity scene: - **Agents** - which is attached to a Unity GameObject (any character within a scene) and handles generating its observations, performing the actions it receives and assigning a reward (positive / negative) when appropriate. Each Agent is linked to a Policy. -- **Academy** - which orchestrates the observation and decision making process. - The External Communicator lives within the Academy. -Every Learning Environment will always have one global Academy and one Agent for +Every Learning Environment will always have one Agent for every character in the scene. While each Agent must be linked to a Policy, it is possible for Agents that have similar observations and actions to have the same Policy type. In our sample game, we have two teams each with their own medic. diff --git a/docs/Migrating.md b/docs/Migrating.md index a957497532..879c3de69f 100644 --- a/docs/Migrating.md +++ b/docs/Migrating.md @@ -11,14 +11,14 @@ The versions can be found in ## Migrating from 0.13 to latest ### Important changes -* The Academy class was changed to be sealed and its virtual methods were removed. +* The Academy class was changed to a singleton, and its virtual methods were removed. * Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard. * Curriculum config files are now YAML formatted and all curricula for a training run are combined into a single file. * The `--num-runs` command-line option has been removed. ### Steps to Migrate * If you have a class that inherits from Academy: - * If the class didn't override any of the virtual methods and didn't store any additional data, you can just replace the instance of it in the scene with an Academy. + * If the class didn't override any of the virtual methods and didn't store any additional data, you can just remove the old script from the scene. * If the class had additional data, create a new MonoBehaviour and store the data on this instead. * If the class overrode the virtual methods, create a new MonoBehaviour and move the logic to it: * Move the InitializeAcademy code to MonoBehaviour.OnAwake diff --git a/docs/Python-API.md b/docs/Python-API.md index b27ead1ed7..9d92874b8c 100644 --- a/docs/Python-API.md +++ b/docs/Python-API.md @@ -263,7 +263,6 @@ i = env.reset() Once a property has been modified in Python, you can access it in C# after the next call to `step` as follows: ```csharp -var academy = FindObjectOfType(); -var sharedProperties = academy.FloatProperties; +var sharedProperties = Academy.Instance.FloatProperties; float property1 = sharedProperties.GetPropertyWithDefault("parameter_1", 0.0f); ``` diff --git a/docs/Training-Curriculum-Learning.md b/docs/Training-Curriculum-Learning.md index 8637fbb36f..8ef2dc5635 100644 --- a/docs/Training-Curriculum-Learning.md +++ b/docs/Training-Curriculum-Learning.md @@ -41,8 +41,8 @@ the same environment. In order to define the curricula, the first step is to decide which parameters of the environment will vary. In the case of the Wall Jump environment, the height of the wall is what varies. We define this as a `Shared Float Property` -that can be accessed in `Academy.FloatProperties`, and by doing so it becomes -adjustable via the Python API. +that can be accessed in `Academy.Instance.FloatProperties`, and by doing +so it becomes adjustable via the Python API. Rather than adjusting it by hand, we will create a YAML file which describes the structure of the curricula. Within it, we can specify which points in the training process our wall height will change, either based on the diff --git a/docs/Training-Generalized-Reinforcement-Learning-Agents.md b/docs/Training-Generalized-Reinforcement-Learning-Agents.md index dcb6ed8358..9f456e095e 100644 --- a/docs/Training-Generalized-Reinforcement-Learning-Agents.md +++ b/docs/Training-Generalized-Reinforcement-Learning-Agents.md @@ -21,7 +21,7 @@ Ball scale of 0.5 | Ball scale of 4 ## Introducing Generalization Using Reset Parameters To enable variations in the environments, we implemented `Reset Parameters`. -`Reset Parameters` are `Academy.FloatProperties` that are used only when +`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when resetting the environment. We also included different sampling methods and the ability to create new kinds of sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md index e221fd696c..49542dcfe1 100644 --- a/docs/Training-ML-Agents.md +++ b/docs/Training-ML-Agents.md @@ -2,7 +2,7 @@ The ML-Agents toolkit conducts training using an external Python training process. During training, this external process communicates with the Academy -object in the Unity scene to generate a block of agent experiences. These +to generate a block of agent experiences. These experiences become the training set for a neural network used to optimize the agent's policy (which is essentially a mathematical function mapping observations to actions). In reinforcement learning, the neural network diff --git a/docs/images/mlagents-3DBallHierarchy.png b/docs/images/mlagents-3DBallHierarchy.png index 70b455632b..9603e6f77d 100644 Binary files a/docs/images/mlagents-3DBallHierarchy.png and b/docs/images/mlagents-3DBallHierarchy.png differ diff --git a/docs/images/mlagents-NewTutHierarchy.png b/docs/images/mlagents-NewTutHierarchy.png deleted file mode 100644 index d1c4e350c5..0000000000 Binary files a/docs/images/mlagents-NewTutHierarchy.png and /dev/null differ diff --git a/docs/images/mlagents-Open3DBall.png b/docs/images/mlagents-Open3DBall.png index 840ad6b64f..c9e6abaf91 100644 Binary files a/docs/images/mlagents-Open3DBall.png and b/docs/images/mlagents-Open3DBall.png differ