Modularized Cross-Embodiment Transformer (MXT) – Pretrained Models from Human2LocoMan

Yaru Niu^1,*    Yunzhe Zhang^1,*    Mingyang Yu¹    Changyi Lin¹    Chenhao Li¹    Yikai Wang¹
Yuxiang Yang²    Wenhao Yu²    Tingnan Zhang²    Zhenzhen Li³    Jonathan Francis^1,3    Bingqing Chen³
Jie Tan²    Ding Zhao¹
¹Carnegie Mellon University    ²Google DeepMind    ³Bosch Center for AI
^*Equal contributions

Robotics: Science and Systems (RSS) 2025
Website | Paper | Code

Descriptive Alt Text

Model Description

Descriptive Alt Text

Our learning framework is designed to efficiently utilize data from both human and robot sources, and account for modality-specific distributions unique to each embodiment. We propose a modularized design Modularized Cross-Embodiment Transformer (MXT). MXT consists mainly of three groups of modules: tokenizers, Transformer trunk, and detokenizers. The tokenizers act as encoders and map embodiment-specific observation modalities to tokens in the latent space, and the detokenizers translate the output tokens from the trunk to action modalities in the action space of each embodiment. The tokenizers and detokenizers are specific to one embodiment and are reinitialized for each new embodiment, while the trunk is shared across all embodiments and reused for transferring the policy among embodiments.

We provide MXT checkpoints pretrained on human data, along with their corresponding config files. Specifically, pour.ckpt is pretrained on the human pouring dataset; scoop.ckpt on the human scooping dataset; shoe_org.ckpt on the human unimanual and bimanual shoe organization dataset; and toy_collect.ckpt on the human unimanual and bimanual toy collection dataset.

Citation

If you find this work helpful, please consider citing the paper:

@inproceedings{niu2025human2locoman,
  title={Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining},
  author={Niu, Yaru and Zhang, Yunzhe and Yu, Mingyang and Lin, Changyi and Li, Chenhao and Wang, Yikai and Yang, Yuxiang and Yu, Wenhao and Zhang, Tingnan and Li, Zhenzhen and Francis, Jonathan and Chen, Bingqing and Tan, Jie and Zhao, Ding},
  booktitle={Robotics: Science and Systems (RSS)},
  year={2025}
}

chrisyrniu
/

mxt

Modularized Cross-Embodiment Transformer (MXT) – Pretrained Models from Human2LocoMan

Model Description

Citation

Dataset used to train chrisyrniu/mxt