[Bug Report] Default Rough Policy for RSL-RL on Go2 Locomotion Fails as a Result of NaN Values

I'm training the default rough policy for RSL-RL on the Go2 Locomotion task and after about ~5000-8000 iterations, the value function loss explodes and as a result it returns Nan values and fail.

### Steps to reproduce

Run the default training script, it should return something like this:

```
################################
                     Learning iteration 8391/10000

                       Computation: 14963 steps/s (collection: 6.407s, learning
0.163s)
               Value function loss: inf
                    Surrogate loss: 0.0000
             Mean action noise std: 0.56
                 Mean total reward: 25.66
               Mean episode length: 1000.00
Episode_Reward/track_lin_vel_xy_exp: 1.2263
Episode_Reward/track_ang_vel_z_exp: 0.6265
       Episode_Reward/lin_vel_z_l2: -0.0483
      Episode_Reward/ang_vel_xy_l2: -0.1196
     Episode_Reward/dof_torques_l2: -0.0977
         Episode_Reward/dof_acc_l2: -0.1826
     Episode_Reward/action_rate_l2: -0.1150
      Episode_Reward/feet_air_time: -0.0014
Episode_Reward/flat_orientation_l2: 0.0000
     Episode_Reward/dof_pos_limits: 0.0000
         Curriculum/terrain_levels: 6.1274
Metrics/base_velocity/error_vel_xy: 0.4202
Metrics/base_velocity/error_vel_yaw: 0.3767
      Episode_Termination/time_out: 3.7917
  Episode_Termination/base_contact: 0.0000
--------------------------------------------------------------------------------
                   Total timesteps: 824967168
                    Iteration time: 6.57s
                        Total time: 59447.90s
                               ETA: 11398.0s
...
ValueError: Expected parameter loc (Tensor of shape (24576, 12)) of distribution Normal(loc: torch.Size
([24576, 12]), scale: torch.Size([24576, 12])) to satisfy the constraint Real(), but found invalid valu
es:                                                                                                    
tensor([[nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        ...,                                                                                           
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',                                        
       grad_fn=<AddmmBackward0>)
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] Default Rough Policy for RSL-RL on Go2 Locomotion Fails as a Result of NaN Values #1999

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] Default Rough Policy for RSL-RL on Go2 Locomotion Fails as a Result of NaN Values #1999

Description

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions