Skip to content

[Bug Report] Default Rough Policy for RSL-RL on Go2 Locomotion Fails as a Result of NaN Values #1999

@alextong1010

Description

@alextong1010

I'm training the default rough policy for RSL-RL on the Go2 Locomotion task and after about ~5000-8000 iterations, the value function loss explodes and as a result it returns Nan values and fail.

Steps to reproduce

Run the default training script, it should return something like this:

################################
                     Learning iteration 8391/10000

                       Computation: 14963 steps/s (collection: 6.407s, learning
0.163s)
               Value function loss: inf
                    Surrogate loss: 0.0000
             Mean action noise std: 0.56
                 Mean total reward: 25.66
               Mean episode length: 1000.00
Episode_Reward/track_lin_vel_xy_exp: 1.2263
Episode_Reward/track_ang_vel_z_exp: 0.6265
       Episode_Reward/lin_vel_z_l2: -0.0483
      Episode_Reward/ang_vel_xy_l2: -0.1196
     Episode_Reward/dof_torques_l2: -0.0977
         Episode_Reward/dof_acc_l2: -0.1826
     Episode_Reward/action_rate_l2: -0.1150
      Episode_Reward/feet_air_time: -0.0014
Episode_Reward/flat_orientation_l2: 0.0000
     Episode_Reward/dof_pos_limits: 0.0000
         Curriculum/terrain_levels: 6.1274
Metrics/base_velocity/error_vel_xy: 0.4202
Metrics/base_velocity/error_vel_yaw: 0.3767
      Episode_Termination/time_out: 3.7917
  Episode_Termination/base_contact: 0.0000
--------------------------------------------------------------------------------
                   Total timesteps: 824967168
                    Iteration time: 6.57s
                        Total time: 59447.90s
                               ETA: 11398.0s
...
ValueError: Expected parameter loc (Tensor of shape (24576, 12)) of distribution Normal(loc: torch.Size
([24576, 12]), scale: torch.Size([24576, 12])) to satisfy the constraint Real(), but found invalid valu
es:                                                                                                    
tensor([[nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        ...,                                                                                           
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan],                                                          
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',                                        
       grad_fn=<AddmmBackward0>)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions