Safetensors

SpiritSight Agent: Advanced GUI Agent with One Look

📄 Paper • 🤖 Models • 📚 Datasets (Coming soon…)

Introduction

SpiritSight-Agent is a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms.

Models

We recommend fine-tuning the base model on custom data.

Model Checkpoint Size License
SpiritSight-Agent-2B-base 🤗 HF Link 2B InternVL
SpiritSight-Agent-8B-base 🤗 HF Link 8B InternVL
SpiritSight-Agent-26B-base 🤗 HF Link 26B InternVL

Datasets

Coming soon.

Inference

conda create -n spiritsight-agent python=3.9

pip install -r requirements.txt
pip install flash-attn==2.3.6 --no-build-isolation

python infer_SSAgent-26B.py

Citation

If you find this repo useful for your research, please kindly cite our paper:

@misc{huang2025spiritsightagentadvancedgui,
      title={SpiritSight Agent: Advanced GUI Agent with One Look}, 
      author={Zhiyuan Huang and Ziming Cheng and Junting Pan and Zhaohui Hou and Mingjie Zhan},
      year={2025},
      eprint={2503.03196},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.03196},
}

Acknowledgments

We thank the following amazing projects that truly inspired us:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including SenseLLM/SpiritSight-Agent-26B