Papers
arxiv:2508.05731

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Published on Aug 7
· Submitted by SiriusL on Aug 11
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Adaptive Exploration Policy Optimization (AEPO) enhances semantic alignment in Multimodal Large Language Models (MLLMs) for GUI interaction, improving performance on benchmarks by up to 9.0% compared to RLVR.

AI-generated summary

The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevent models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework. AEPO employs a multi-answer generation strategy to enforce broader exploration, which is then guided by a theoretically grounded Adaptive Exploration Reward (AER) function derived from first principles of efficiency eta=U/C. Our AEPO-trained models, InfiGUI-G1-3B and InfiGUI-G1-7B, establish new state-of-the-art results across multiple challenging GUI grounding benchmarks, achieving significant relative improvements of up to 9.0% against the naive RLVR baseline on benchmarks designed to test generalization and semantic understanding. Resources are available at https://github.com/InfiXAI/InfiGUI-G1.

Community

Paper author Paper submitter

This paper tackles a critical bottleneck in GUI agent development: semantic alignment. While many models are proficient at spatial alignment (precisely locating a known UI element), they often fail at semantic alignment, which involves understanding an instruction's intent to identify the functionally correct element. The authors identify that standard reinforcement learning methods suffer from inefficient exploration, getting stuck on high-confidence but incorrect selections. Their core solution, Adaptive Exploration Policy Optimization (AEPO), directly addresses this by compelling the model to generate multiple diverse candidate answers in a single forward pass. This wider exploration is guided by a novel Adaptive Exploration Reward (AER) function, which is derived from efficiency principles (η=U/C) and rewards the model not just for finding the correct answer, but for finding it with high confidence (a low rank k among the candidates) and minimal effort (a small number of proposals N).

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.05731 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.05731 in a Space README.md to link it from this page.

Collections including this paper 6