arxiv:2508.05615

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency

Published on Aug 7

· Submitted by

yanyc on Aug 13

Upvote

Authors:

Yong Du ,

Yuchen Yan ,

Yongliang Shen

Abstract

GUI-RC and GUI-RCPO enhance GUI grounding accuracy by leveraging spatial consistency and reinforcement learning without additional training data.

AI-generated summary

Graphical User Interface (GUI) grounding, the task of mapping natural language instructions to precise screen coordinates, is fundamental to autonomous GUI agents. While existing methods achieve strong performance through extensive supervised training or reinforcement learning with labeled rewards, they remain constrained by the cost and availability of pixel-level annotations. We observe that when models generate multiple predictions for the same GUI element, the spatial overlap patterns reveal implicit confidence signals that can guide more accurate localization. Leveraging this insight, we propose GUI-RC (Region Consistency), a test-time scaling method that constructs spatial voting grids from multiple sampled predictions to identify consensus regions where models show highest agreement. Without any training, GUI-RC improves accuracy by 2-3% across various architectures on ScreenSpot benchmarks. We further introduce GUI-RCPO (Region Consistency Policy Optimization), which transforms these consistency patterns into rewards for test-time reinforcement learning. By computing how well each prediction aligns with the collective consensus, GUI-RCPO enables models to iteratively refine their outputs on unlabeled data during inference. Extensive experiments demonstrate the generality of our approach: GUI-RC boosts Qwen2.5-VL-3B-Instruct from 80.11% to 83.57% on ScreenSpot-v2, while GUI-RCPO further improves it to 85.14% through self-supervised optimization. Our approach reveals the untapped potential of test-time scaling and test-time reinforcement learning for GUI grounding, offering a promising path toward more robust and data-efficient GUI agents.

View arXiv page View PDF Project page GitHub 19 Add to collection

Community

yanyc

Paper author Paper submitter 1 day ago

We are happy to introduce GUI-RC and GUI-RCPO, methods that improve GUI grounding accuracy by leveraging spatial consistency in model predictions without additional training data, boosting performance by 2-5% through test-time consensus voting and self-supervised reinforcement learning.

Project Page: https://zju-real.github.io/gui-rcpo/
Github: https://github.com/zju-real/gui-rcpo