Motivation
The standard approach for overcoming the embodiment gap from humans to humanoids is to use kinematic retargeting from the source human motion to the target humanoid embodiment. This practice overlooks glaring artifacts introduced by the retargeting process (such as foot sliding, ground penetration, and physically impossible motion due to self-penetration), instead forcing the RL policy to imitate physically infeasible motions while maintaining physical constraints. Prior work has shown that while training policies on retargeted data with severe artifacts in simulation is possible, transferring them to the real world demands extensive trial-and-error, reward shaping, and parameter tuning. Considering this practice, our hypothesis is that with enough engineering in the reward function and domain randomization, the artifacts caused by retargeting can be mostly mitigated or removed. However, without these engineering efforts, the quality of retargeting results plays a significant role.