Clips actions to large limits before applying them to the environment #984

renezurbruegg · 2024-09-13T12:04:15Z

Description

Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.

Doing this, can lead to instability during training, since applying a large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.

This PR aims to fix this, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and - in my case - greatly improves training stability.

Type of change

New feature (non-breaking change which adds functionality)

TODO

Support multi dimensional action bounds.
Add tests

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…d_action_limits

renezurbruegg · 2024-09-13T12:05:51Z

Slightly related issue: #673

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env_cfg.py

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_env_cfg.py

Mayankm96

We should also change the gym.spaces range to these bounds?

Toni-SM · 2024-09-15T22:54:21Z

In my opinion, RL libraries should take care of this and no Isaac Lab.
Also, adding this config prevents defining other spaces since this is specific tied to Box.
Instead, I propose the following: #864 (comment), and with the statement that the RL libraries are in charge of generating a valid action for the task.

Mayankm96 · 2024-09-16T04:58:23Z

@Toni-SM I agree. We should move this to the environment wrappers (similar to what we do for RL-Games):

https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rl_games.py#L82-L94

Regarding, the action/obs space design for the environments, I think it is better to do that as its own separate thing. The current fix in this MR is at least critical for the continuous learning tasks as users otherwise get "NaNs" from the simulation due to the policy feedback loop (large action into observations that then lead to larger action predictions - which eventually cause the sim to go unstable). So I'd prefer if we don't block this fix itself.

jsmith-bdai · 2024-09-17T13:15:56Z

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env_cfg.py

    Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
    """
+
+    action_bounds: list[float] = [-100, 100]


Just curious where is [-100, 100] from? I wonder if it's best to leave this user-specified?

The 100 limits, comes from our internal codebase, still from legged gym.

I was considering having it None or Inf by default, but then users need to consciously set this value, and I think most people that have training stability issues will probably not think about that.

Could set it to None and add a FAQ to the docs?

I think we can set to -inf, inf

…_env_cfg.py Co-authored-by: Mayank Mittal <[email protected]> Signed-off-by: renezurbruegg <[email protected]>

…ased_env_cfg.py Co-authored-by: Mayank Mittal <[email protected]> Signed-off-by: renezurbruegg <[email protected]>

Mayankm96 · 2024-09-25T10:12:19Z

@renezurbruegg Would you be able to help move the changes to the wrappers?

…tion_limits

renezurbruegg · 2024-10-03T07:44:41Z

This will introduce "arbitrary" bounds of -100,100 to any new user that merges this PR, which could lead to unexpected behaviour.

How should this be addressed?

In my opinion there are three options:

Don't do anything, since the bounds of 100 are super large and the policies should not produce these actions anyway.
Change the default to -inf, inf, essentially keeping it the same as now but add a FAQ to the documentation referring to this issue.
Add a check if the limits have been active in any environment and print a warning to the terminal.

I personally prefer option (3).

Toni-SM · 2024-10-03T13:19:53Z

Hi @renezurbruegg

Please, note that current implementation is in conflict with #1117 for the direct workflow

renezurbruegg · 2024-10-03T13:38:44Z

Can these changes here directly be integrated in #1117 then?

Toni-SM · 2024-10-03T13:54:05Z

@renezurbruegg , as I commented previously, in my opinion the RL libraries should take care of this and no Isaac Lab.

For example, using skrl you can set model parameter clip_actions: True or define the model output as follows output: 100 * tanh(ACTIONS) in the agent config file skrl_ppo_cfg.yaml.

However, if the target library is not able to take care of that, the option number 3 (which will not prevent the training from throwing an exception after a certain time of execution) you mentioned, or the clipping of the action directly in the task implementation for critical cases, could be a solution.

Mayankm96 · 2025-03-05T08:16:59Z

This MR is important. However, I agree with Toni's remark that the RL library should handle it. I have made a new MR for the RL wrapper environment: #2019

Closing this MR as it has become stale.

# Description Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation. Doing this can lead to instability during training since applying a large action can introduce a negative feedback loop. More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future. This PR aims to fix this for RSL-RL library, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and greatly improves training stability. Fixes #984, #1732, #1999 ## Type of change - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

# Description Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation. Doing this can lead to instability during training since applying a large action can introduce a negative feedback loop. More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future. This PR aims to fix this for RSL-RL library, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and greatly improves training stability. Fixes isaac-sim#984, isaac-sim#1732, isaac-sim#1999 ## Type of change - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

renezurbruegg added 3 commits September 13, 2024 13:41

Add action clipping

cb8b360

Merge branch 'main' of github.com:renezurbruegg/IsaacLab into fix/har…

c79f7a8

…d_action_limits

Add hard clipping to all envs

6f41f2d

renezurbruegg requested review from Dhoeller19, Mayankm96, jsmith-bdai and kellyguo11 as code owners September 13, 2024 12:04

Mayankm96 reviewed Sep 15, 2024

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env_cfg.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Sep 15, 2024

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_env_cfg.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Sep 15, 2024

View reviewed changes

jsmith-bdai reviewed Sep 17, 2024

View reviewed changes

renezurbruegg and others added 2 commits September 19, 2024 08:17

Update source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl…

43878cb

…_env_cfg.py Co-authored-by: Mayank Mittal <[email protected]> Signed-off-by: renezurbruegg <[email protected]>

Update source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_b…

912aaef

…ased_env_cfg.py Co-authored-by: Mayank Mittal <[email protected]> Signed-off-by: renezurbruegg <[email protected]>

Mayankm96 changed the title ~~Clip actions to large hard limits before applying them to the environment~~ Clips actions to large limits before applying them to the environment Sep 19, 2024

Mayankm96 mentioned this pull request Sep 23, 2024

[Question]My Action becomes all nan during the training.!!! #1024

Closed

renezurbruegg added 2 commits October 3, 2024 09:38

Update gym spaces

00c4fda

Merge branch 'main' of github.com:isaac-sim/IsaacLab into fix/hard_ac…

f215bac

…tion_limits

renezurbruegg self-assigned this Oct 3, 2024

Merge branch 'main' into fix/hard_action_limits

5ce2bd5

Dhoeller19 unassigned renezurbruegg Oct 31, 2024

kellyguo11 mentioned this pull request Dec 5, 2024

Adds clip range for JointAction #1476

Merged

6 tasks

celestialdr4g0n mentioned this pull request Dec 11, 2024

[Bug Report] actor's std becomes "nan" during PPO training leggedrobotics/rsl_rl#33

Closed

Mayankm96 mentioned this pull request Jan 27, 2025

[Proposal] Incorrect clipping of actions #1732

Open

3 tasks

Mayankm96 added the bug Something isn't working label Feb 13, 2025

Mayankm96 force-pushed the main branch from 9e160e0 to 82f3613 Compare March 2, 2025 14:24

Mayankm96 mentioned this pull request Mar 5, 2025

Adds action clipping to rsl-rl wrapper #2019

Merged

6 tasks

Mayankm96 closed this Mar 5, 2025

Toni-SM mentioned this pull request Apr 21, 2025

[Bug Report] During training the models disappear from the scene #2294

Closed

4 tasks

Clips actions to large limits before applying them to the environment #984

Clips actions to large limits before applying them to the environment #984

Conversation

renezurbruegg commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

TODO

Checklist

Uh oh!

renezurbruegg commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mayankm96 left a comment

Choose a reason for hiding this comment

Uh oh!

Toni-SM commented Sep 15, 2024

Uh oh!

Mayankm96 commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsmith-bdai Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

renezurbruegg Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

Dhoeller19 Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

Mayankm96 commented Sep 25, 2024

Uh oh!

renezurbruegg commented Oct 3, 2024

Uh oh!

Toni-SM commented Oct 3, 2024

Uh oh!

renezurbruegg commented Oct 3, 2024

Uh oh!

Toni-SM commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mayankm96 commented Mar 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

renezurbruegg commented Sep 13, 2024 •

edited

Loading

renezurbruegg commented Sep 13, 2024 •

edited

Loading

Mayankm96 commented Sep 16, 2024 •

edited

Loading

Toni-SM commented Oct 3, 2024 •

edited

Loading