Robotics
Transformers
Human
Pretraining
Manipulation

Modularized Cross-Embodiment Transformer (MXT) – Pretrained Models from Human2LocoMan

Yaru Niu1,*    Yunzhe Zhang1,*    Mingyang Yu1    Changyi Lin1    Chenhao Li1    Yikai Wang1
Yuxiang Yang2    Wenhao Yu2    Tingnan Zhang2    Zhenzhen Li3    Jonathan Francis1,3    Bingqing Chen3   
Jie Tan2    Ding Zhao1   
1Carnegie Mellon University    2Google DeepMind    3Bosch Center for AI   
*Equal contributions

Robotics: Science and Systems (RSS) 2025
Website | Paper | Code

Descriptive Alt Text


Model Description

Descriptive Alt Text

Our learning framework is designed to efficiently utilize data from both human and robot sources, and account for modality-specific distributions unique to each embodiment. We propose a modularized design Modularized Cross-Embodiment Transformer (MXT). MXT consists mainly of three groups of modules: tokenizers, Transformer trunk, and detokenizers. The tokenizers act as encoders and map embodiment-specific observation modalities to tokens in the latent space, and the detokenizers translate the output tokens from the trunk to action modalities in the action space of each embodiment. The tokenizers and detokenizers are specific to one embodiment and are reinitialized for each new embodiment, while the trunk is shared across all embodiments and reused for transferring the policy among embodiments.

We provide MXT checkpoints pretrained on human data, along with their corresponding config files. Specifically, pour.ckpt is pretrained on the human pouring dataset; scoop.ckpt on the human scooping dataset; shoe_org.ckpt on the human unimanual and bimanual shoe organization dataset; and toy_collect.ckpt on the human unimanual and bimanual toy collection dataset.

Citation

If you find this work helpful, please consider citing the paper:

@inproceedings{niu2025human2locoman,
  title={Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining},
  author={Niu, Yaru and Zhang, Yunzhe and Yu, Mingyang and Lin, Changyi and Li, Chenhao and Wang, Yikai and Yang, Yuxiang and Yu, Wenhao and Zhang, Tingnan and Li, Zhenzhen and Francis, Jonathan and Chen, Bingqing and Tan, Jie and Zhao, Ding},
  booktitle={Robotics: Science and Systems (RSS)},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train chrisyrniu/mxt