---
license: mit
tags:
- robotics
- multimodal
- finetuning
- vla
---

# Model Card

These are the model checkpoints used in the paper *VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models*.

Currently we release the Qwen2.5 VLM checkpoints as well as necessary networks for training. We will release all checkpoints after the paper gets accepted.

## Source

- Project Page: https://nus-lins-lab.github.io/vlaos/
- Paper: https://arxiv.org/abs/2506.17561
- Code: https://github.com/HeegerGao/VLA-OS
- Data: https://huggingface.co/datasets/Linslab/VLA-OS-Dataset

## Usage

Ensure you have installed git lfs:
```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
```

Then download this repo:
```bash
git clone https://huggingface.co/Linslab/VLA-OS
```

## Model Description

Please refer to the codebase for more description and usage.

## Citation

If you find our work helpful, please cite us:

```bibtex
@article{gao2025vlaos,
  title   = {VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models},
  author  = {Gao, Chongkai and Liu, Zixuan and Chi, Zhenghao and Huang, Junshan and Fei, Xin and Hou, Yiwen and Zhang, Yuxuan and Lin, Yudi and Fang, Zhirui and Jiang, Zeyu and Shao, Lin},
  journal = {arXiv preprint arXiv:2506.17561},
  year    = {2025},
  url     = {https://arxiv.org/abs/2506.17561}
}
```

Thank you!