-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Description
I'm training the default rough policy for RSL-RL on the Go2 Locomotion task and after about ~5000-8000 iterations, the value function loss explodes and as a result it returns Nan values and fail.
Steps to reproduce
Run the default training script, it should return something like this:
################################
Learning iteration 8391/10000
Computation: 14963 steps/s (collection: 6.407s, learning
0.163s)
Value function loss: inf
Surrogate loss: 0.0000
Mean action noise std: 0.56
Mean total reward: 25.66
Mean episode length: 1000.00
Episode_Reward/track_lin_vel_xy_exp: 1.2263
Episode_Reward/track_ang_vel_z_exp: 0.6265
Episode_Reward/lin_vel_z_l2: -0.0483
Episode_Reward/ang_vel_xy_l2: -0.1196
Episode_Reward/dof_torques_l2: -0.0977
Episode_Reward/dof_acc_l2: -0.1826
Episode_Reward/action_rate_l2: -0.1150
Episode_Reward/feet_air_time: -0.0014
Episode_Reward/flat_orientation_l2: 0.0000
Episode_Reward/dof_pos_limits: 0.0000
Curriculum/terrain_levels: 6.1274
Metrics/base_velocity/error_vel_xy: 0.4202
Metrics/base_velocity/error_vel_yaw: 0.3767
Episode_Termination/time_out: 3.7917
Episode_Termination/base_contact: 0.0000
--------------------------------------------------------------------------------
Total timesteps: 824967168
Iteration time: 6.57s
Total time: 59447.90s
ETA: 11398.0s
...
ValueError: Expected parameter loc (Tensor of shape (24576, 12)) of distribution Normal(loc: torch.Size
([24576, 12]), scale: torch.Size([24576, 12])) to satisfy the constraint Real(), but found invalid valu
es:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<AddmmBackward0>)
Metadata
Metadata
Assignees
Labels
No labels