Skip to content

Conversation

@Mayankm96
Copy link
Contributor

@Mayankm96 Mayankm96 commented Mar 5, 2025

Description

Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.

Doing this can lead to instability during training since applying a large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.

This PR aims to fix this for RSL-RL library, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and greatly improves training stability.

Fixes #984, #1732, #1999

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Copy link
Collaborator

@pascal-roth pascal-roth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Mayankm96 Mayankm96 added the bug Something isn't working label Mar 10, 2025
@kellyguo11 kellyguo11 merged commit f774425 into main Mar 13, 2025
4 of 5 checks passed
@kellyguo11 kellyguo11 deleted the fix/rsl-rl-clip branch March 13, 2025 02:17
jtigue-bdai pushed a commit that referenced this pull request Apr 14, 2025
# Description

Currently, the actions from the policy are directly applied to the
environment and also often fed back to the policy using the last action
as observation.

Doing this can lead to instability during training since applying a
large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large
last_action observations, which often results in a large error in the
critic, which can lead to even larger actions being sampled in the
future.

This PR aims to fix this for RSL-RL library, by clipping the actions to
(large) hard limits before applying them to the environment. This
prohibits the actions from growing continuously and greatly improves
training stability.

Fixes #984, #1732, #1999

## Type of change

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
ToxicNS pushed a commit to ToxicNS/IsaacLab that referenced this pull request Apr 24, 2025
# Description

Currently, the actions from the policy are directly applied to the
environment and also often fed back to the policy using the last action
as observation.

Doing this can lead to instability during training since applying a
large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large
last_action observations, which often results in a large error in the
critic, which can lead to even larger actions being sampled in the
future.

This PR aims to fix this for RSL-RL library, by clipping the actions to
(large) hard limits before applying them to the environment. This
prohibits the actions from growing continuously and greatly improves
training stability.

Fixes isaac-sim#984, isaac-sim#1732, isaac-sim#1999

## Type of change

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants