[Question] How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer? Residual motion in shadow_hand_over direct environment MAPPO

###  How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer?
### The result of ShadowHandOver env
I run the play script with this command

```
./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task=Isaac-Shadow-Hand-Over-Direct-v0 --num_env=1 --algorithm=MAPPO
```
As demonstrated in the video, residual motion is observed in the fingers after the object is transferred to the opposite hand. Additionally, the left hand exhibits noticeable shaking while holding the object.

### Modifing reward functions of my own RL problem does not stop the residual motion.
This section describes my attempt to develop a multi-agent reinforcement learning system. I hypothesized that modifying the reward functions would eliminate the residual motion, but this proved ineffective. I had previously reported this shaking behavior in issue #1935. With merge request #1972, I adjusted the scaling of my reward functions. 

I adjusted the scales of the two functions joint_vel_l2() and action_rate_l2(), but the issue persisted. Subsequently, I incorporated additional reward components, such as action_prv_action() and joint_acc_rate_l2(); however, the residual motion remained. Below are the related functions:

```
def action_rate_new(actions: torch.Tensor, prv_actions: torch.Tensor, prv_prv_actions: torch.Tensor) -> torch.Tensor:
	"""Compute the L2 norm of the action rate."""
	return torch.sum(torch.square(actions - 2*prv_actions + prv_prv_actions), dim = 1)
	
def action_rate_l2(action: torch.Tensor, prv_action: torch.Tensor) -> torch.Tensor:
	"""Compute the L2 norm of the action rate."""
	return torch.sum(torch.square(action - prv_action), dim = 1)

def joint_vel_l2(joint_vel: torch.Tensor, joint_ids: list[int]) -> torch.Tensor:
	"""Penalize joint velocities on the articulation using L2 squared kernel."""
	return torch.sum(torch.square(joint_vel[:, joint_ids]), dim=1)

def joint_acc_l2(joint_acc: torch.Tensor, joint_ids: list[int]) -> torch.Tensor:
	"""Penalize joint velocities on the articulation using L2 squared kernel."""
	return torch.sum(torch.square(joint_acc[:, joint_ids]), dim=1) 
```

I also experimented with increasing the rollout, episode length (episode_length_s), and mini-batch size, yet the residual motion remained.

If the outcome shown in the video is considered successful, what additional steps are required for the Sim2Real transfer?

Thanks a bunch! I’ve really appreciated all the help from the team.

https://github.com/user-attachments/assets/c7f38f1b-c575-4fb8-ba15-e82523ef3956
 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer? Residual motion in shadow_hand_over direct environment MAPPO #2049

How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer?

The result of ShadowHandOver env

Modifing reward functions of my own RL problem does not stop the residual motion.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer? Residual motion in shadow_hand_over direct environment MAPPO #2049

Description

How to define a success training? If residual motion shown using play.py then what should be done before Sim2Real transfer?

The result of ShadowHandOver env

Modifing reward functions of my own RL problem does not stop the residual motion.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions