-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer?
The result of ShadowHandOver env
I run the play script with this command
./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task=Isaac-Shadow-Hand-Over-Direct-v0 --num_env=1 --algorithm=MAPPO
As demonstrated in the video, residual motion is observed in the fingers after the object is transferred to the opposite hand. Additionally, the left hand exhibits noticeable shaking while holding the object.
Modifing reward functions of my own RL problem does not stop the residual motion.
This section describes my attempt to develop a multi-agent reinforcement learning system. I hypothesized that modifying the reward functions would eliminate the residual motion, but this proved ineffective. I had previously reported this shaking behavior in issue #1935. With merge request #1972, I adjusted the scaling of my reward functions.
I adjusted the scales of the two functions joint_vel_l2() and action_rate_l2(), but the issue persisted. Subsequently, I incorporated additional reward components, such as action_prv_action() and joint_acc_rate_l2(); however, the residual motion remained. Below are the related functions:
def action_rate_new(actions: torch.Tensor, prv_actions: torch.Tensor, prv_prv_actions: torch.Tensor) -> torch.Tensor:
"""Compute the L2 norm of the action rate."""
return torch.sum(torch.square(actions - 2*prv_actions + prv_prv_actions), dim = 1)
def action_rate_l2(action: torch.Tensor, prv_action: torch.Tensor) -> torch.Tensor:
"""Compute the L2 norm of the action rate."""
return torch.sum(torch.square(action - prv_action), dim = 1)
def joint_vel_l2(joint_vel: torch.Tensor, joint_ids: list[int]) -> torch.Tensor:
"""Penalize joint velocities on the articulation using L2 squared kernel."""
return torch.sum(torch.square(joint_vel[:, joint_ids]), dim=1)
def joint_acc_l2(joint_acc: torch.Tensor, joint_ids: list[int]) -> torch.Tensor:
"""Penalize joint velocities on the articulation using L2 squared kernel."""
return torch.sum(torch.square(joint_acc[:, joint_ids]), dim=1)
I also experimented with increasing the rollout, episode length (episode_length_s), and mini-batch size, yet the residual motion remained.
If the outcome shown in the video is considered successful, what additional steps are required for the Sim2Real transfer?
Thanks a bunch! I’ve really appreciated all the help from the team.