ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World
We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration.
This repo contains the LoRA checkpoints in the training process of ScreenExplorer-3B-E1
and ScreenExplorer-7B-E1
. And LoRA checkpoints of ScreenExplorer-3B-Distill
.
Citation
@misc{niu2025screenexplorertrainingvisionlanguagemodel,
title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World},
author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang},
year={2025},
eprint={2505.19095},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.19095},
}
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for niurl/ScreenExplorer
Base model
Qwen/Qwen2.5-VL-3B-Instruct