[Bug Report] actor's std becomes "nan" during PPO training

I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the following error occurs during training（After some progress in training）, indicating that the actor's standard deviation does not meet the condition of being ≥ 0. Has anyone experienced a similar error?
num_env is 3600
```
Traceback (most recent call last):
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 131, in <module>
    main()
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 123, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 153, in learn
    mean_value_loss, mean_surrogate_loss = self.alg.update()
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 121, in update
    self.actor_critic.act(obs_batch, masks=masks_batch, hidden_states=hid_states_batch[0])
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act
    
  File "/isaac-sim/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 74, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))  
RuntimeError: normal expects all elements of std >= 0.0
```

I investigated the value of std(self.scale) and found that the std value in a certain environment appears to be nan. (The number of columns represents the action dimensions for the robot.)

```
self.scale: tensor([[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
...,
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500]],
device='cuda:0')
env_id: 1111, row: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       device='cuda:0')

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] actor's std becomes "nan" during PPO training #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] actor's std becomes "nan" during PPO training #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions