Update README.md
Browse files
README.md
CHANGED
@@ -166,6 +166,50 @@ To reproduce the results in our paper, please refer to our repo for detailed ins
|
|
166 |
|
167 |
For more details on the methodology and evaluation, please refer to our [paper](https://arxiv.org/abs/2508.05731) and [repository](https://github.com/InfiXAI/InfiGUI-G1).
|
168 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
169 |
## Citation Information
|
170 |
|
171 |
If you find this work useful, we would be grateful if you consider citing the following papers:
|
|
|
166 |
|
167 |
For more details on the methodology and evaluation, please refer to our [paper](https://arxiv.org/abs/2508.05731) and [repository](https://github.com/InfiXAI/InfiGUI-G1).
|
168 |
|
169 |
+
## Results
|
170 |
+
|
171 |
+
Our InfiGUI-G1 models, trained with the AEPO framework, establish new state-of-the-art results among open-source models across a diverse and challenging set of GUI grounding benchmarks.
|
172 |
+
|
173 |
+
### MMBench-GUI (L2) Results
|
174 |
+
|
175 |
+
On the comprehensive MMBench-GUI benchmark, which evaluates performance across various platforms and instruction complexities, our InfiGUI-G1 models establish new state-of-the-art results for open-source models in their respective size categories.
|
176 |
+
|
177 |
+
<div align="center">
|
178 |
+
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_mmbench-gui.png" width="90%" alt="MMBench-GUI Results">
|
179 |
+
</div>
|
180 |
+
|
181 |
+
### ScreenSpot-Pro Results
|
182 |
+
|
183 |
+
On the challenging ScreenSpot-Pro benchmark, designed to test semantic understanding on high-resolution professional software, InfiGUI-G1 demonstrates significant improvements, particularly on icon-based grounding tasks. This highlights AEPO's effectiveness in enhancing semantic alignment by associating abstract visual symbols with their functions.
|
184 |
+
|
185 |
+
<div align="center">
|
186 |
+
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-pro.png" width="90%" alt="ScreenSpot-Pro Results">
|
187 |
+
</div>
|
188 |
+
|
189 |
+
### UI-Vision (Element Grounding) Results
|
190 |
+
|
191 |
+
InfiGUI-G1 shows strong generalization capabilities on the UI-Vision benchmark, which is designed to test robustness across a wide variety of unseen desktop applications. Achieving high performance confirms that our AEPO framework fosters a robust understanding rather than overfitting to the training data.
|
192 |
+
|
193 |
+
<div align="center">
|
194 |
+
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_ui-vision.png" width="90%" alt="UI-Vision Results">
|
195 |
+
</div>
|
196 |
+
|
197 |
+
### UI-I2E-Bench Results
|
198 |
+
|
199 |
+
To further probe semantic reasoning, we evaluated on UI-I2E-Bench, a benchmark featuring a high proportion of implicit instructions that require reasoning beyond direct text matching. Our model's strong performance underscores AEPO's ability to handle complex, indirect commands.
|
200 |
+
|
201 |
+
<div align="center">
|
202 |
+
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_i2e-bench.png" width="90%" alt="UI-I2E-Bench Results">
|
203 |
+
</div>
|
204 |
+
|
205 |
+
### ScreenSpot-V2 Results
|
206 |
+
|
207 |
+
On the widely-used ScreenSpot-V2 benchmark, which provides comprehensive coverage across mobile, desktop, and web platforms, InfiGUI-G1 consistently outperforms strong baselines, demonstrating the broad applicability and data efficiency of our approach.
|
208 |
+
|
209 |
+
<div align="center">
|
210 |
+
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
|
211 |
+
</div>
|
212 |
+
|
213 |
## Citation Information
|
214 |
|
215 |
If you find this work useful, we would be grateful if you consider citing the following papers:
|