|
19 | 19 | - [RayCast Observation Summary & Best Practices](#raycast-observation-summary--best-practices) |
20 | 20 | - [Variable Length Observations](#variable-length-observations) |
21 | 21 | - [Variable Length Observation Summary & Best Practices](#variable-length-observation-summary--best-practices) |
| 22 | + - [Goal Signal](#goal-signal) |
| 23 | + - [Goal Signal Summary & Best Practices](#goal-signal-summary--best-practices) |
22 | 24 | - [Actions and Actuators](#actions-and-actuators) |
23 | 25 | - [Continuous Actions](#continuous-actions) |
24 | 26 | - [Discrete Actions](#discrete-actions) |
@@ -562,6 +564,36 @@ between -1 and 1. |
562 | 564 | of an entity to the `BufferSensor`. |
563 | 565 | - Normalize the entities observations before feeding them into the `BufferSensor`. |
564 | 566 |
|
| 567 | +### Goal Signal |
| 568 | + |
| 569 | +It is possible for agents to collect observations that will be treated as "goal signal". |
| 570 | +A goal signal is used to condition the policy of the agent, meaning that if the goal |
| 571 | +changes, the policy (i.e. the mapping from observations to actions) will change |
| 572 | +as well. Note that this is true |
| 573 | +for any observation since all observations influence the policy of the Agent to |
| 574 | +some degree. But by specifying a goal signal explicitly, we can make this conditioning |
| 575 | +more important to the agent. This feature can be used in settings where an agent |
| 576 | +must learn to solve different tasks that are similar by some aspects because the |
| 577 | +agent will learn to reuse learnings from different tasks to generalize better. |
| 578 | +In Unity, you can specify that a `VectorSensor` or |
| 579 | +a `CameraSensor` is a goal by attaching a `VectorSensorComponent` or a |
| 580 | +`CameraSensorComponent` to the Agent and selecting `Goal Signal` as `Observation Type`. |
| 581 | +On the trainer side, there are two different ways to condition the policy. This |
| 582 | +setting is determined by the |
| 583 | +[conditioning_type parameter](Training-Configuration-File.md#common-trainer-configurations). |
| 584 | +If set to `hyper` (default) a [HyperNetwork](https://arxiv.org/pdf/1609.09106.pdf) |
| 585 | +will be used to generate some of the |
| 586 | +weights of the policy using the goal observations as input. Note that using a |
| 587 | +HyperNetwork requires a lot of computations, it is recommended to use a smaller |
| 588 | +number of hidden units in the policy to alleviate this. |
| 589 | +If set to `none` the goal signal will be considered as regular observations. |
| 590 | + |
| 591 | +#### Goal Signal Summary & Best Practices |
| 592 | + - Attach a `VectorSensorComponent` or `CameraSensorComponent` to an agent and |
| 593 | + set the observation type to goal to use the feature. |
| 594 | + - Set the conditioning_type parameter in the training configuration. |
| 595 | + - Reduce the number of hidden units in the network when using the HyperNetwork |
| 596 | + conditioning type. |
565 | 597 |
|
566 | 598 | ## Actions and Actuators |
567 | 599 |
|
|
0 commit comments