--- license: mit tags: - robotics - multimodal - finetuning - vla --- # Model Card These are the model checkpoints used in the paper *VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models*. Currently we release the Qwen2.5 VLM checkpoints as well as necessary networks for training. We will release all checkpoints after the paper gets accepted. ## Source - Project Page: https://nus-lins-lab.github.io/vlaos/ - Paper: https://arxiv.org/abs/2506.17561 - Code: https://github.com/HeegerGao/VLA-OS - Data: https://huggingface.co/datasets/Linslab/VLA-OS-Dataset ## Usage Ensure you have installed git lfs: ```bash curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt-get install git-lfs git lfs install ``` Then download this repo: ```bash git clone https://huggingface.co/Linslab/VLA-OS ``` ## Model Description Please refer to the codebase for more description and usage. ## Citation If you find our work helpful, please cite us: ```bibtex @article{gao2025vlaos, title = {VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models}, author = {Gao, Chongkai and Liu, Zixuan and Chi, Zhenghao and Huang, Junshan and Fei, Xin and Hou, Yiwen and Zhang, Yuxuan and Lin, Yudi and Fang, Zhirui and Jiang, Zeyu and Shao, Lin}, journal = {arXiv preprint arXiv:2506.17561}, year = {2025}, url = {https://arxiv.org/abs/2506.17561} } ``` Thank you!