Adds action clipping to rsl-rl wrapper #2019

Mayankm96 · 2025-03-05T08:16:01Z

Description

Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.

Doing this can lead to instability during training since applying a large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.

This PR aims to fix this for RSL-RL library, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and greatly improves training stability.

Fixes #984, #1732, #1999

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

pascal-roth

LGTM

# Description Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation. Doing this can lead to instability during training since applying a large action can introduce a negative feedback loop. More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future. This PR aims to fix this for RSL-RL library, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and greatly improves training stability. Fixes #984, #1732, #1999 ## Type of change - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

# Description Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation. Doing this can lead to instability during training since applying a large action can introduce a negative feedback loop. More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future. This PR aims to fix this for RSL-RL library, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and greatly improves training stability. Fixes isaac-sim#984, isaac-sim#1732, isaac-sim#1999 ## Type of change - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Adds action clipping to rsl-rl wrapper

0bf5a91

Mayankm96 requested review from jsmith-bdai and kellyguo11 as code owners March 5, 2025 08:16

Mayankm96 mentioned this pull request Mar 5, 2025

Clips actions to large limits before applying them to the environment #984

Closed

8 tasks

pascal-roth approved these changes Mar 5, 2025

View reviewed changes

pascal-roth mentioned this pull request Mar 5, 2025

[Bug Report] Default Rough Policy for RSL-RL on Go2 Locomotion Fails as a Result of NaN Values #1999

Closed

Mayankm96 added 2 commits March 10, 2025 12:42

modify action space

adc9bc0

updates changelog

3bad602

Mayankm96 added the bug Something isn't working label Mar 10, 2025

kellyguo11 approved these changes Mar 10, 2025

View reviewed changes

kellyguo11 merged commit f774425 into main Mar 13, 2025
4 of 5 checks passed

kellyguo11 deleted the fix/rsl-rl-clip branch March 13, 2025 02:17

felipemohr mentioned this pull request Mar 29, 2025

[Bug Report] 'RslRlVecEnvWrapper' object has no attribute 'num_actions' when using new clip_actions from #2019 #2184

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds action clipping to rsl-rl wrapper #2019

Adds action clipping to rsl-rl wrapper #2019

Uh oh!

Mayankm96 commented Mar 5, 2025 •

edited

Loading

Uh oh!

pascal-roth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adds action clipping to rsl-rl wrapper #2019

Adds action clipping to rsl-rl wrapper #2019

Uh oh!

Conversation

Mayankm96 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

pascal-roth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Mayankm96 commented Mar 5, 2025 •

edited

Loading