SiriusL commited on
Commit
5e95c7f
·
verified ·
1 Parent(s): f521761

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md CHANGED
@@ -166,6 +166,50 @@ To reproduce the results in our paper, please refer to our repo for detailed ins
166
 
167
  For more details on the methodology and evaluation, please refer to our [paper](https://arxiv.org/abs/2508.05731) and [repository](https://github.com/InfiXAI/InfiGUI-G1).
168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  ## Citation Information
170
 
171
  If you find this work useful, we would be grateful if you consider citing the following papers:
 
166
 
167
  For more details on the methodology and evaluation, please refer to our [paper](https://arxiv.org/abs/2508.05731) and [repository](https://github.com/InfiXAI/InfiGUI-G1).
168
 
169
+ ## Results
170
+
171
+ Our InfiGUI-G1 models, trained with the AEPO framework, establish new state-of-the-art results among open-source models across a diverse and challenging set of GUI grounding benchmarks.
172
+
173
+ ### MMBench-GUI (L2) Results
174
+
175
+ On the comprehensive MMBench-GUI benchmark, which evaluates performance across various platforms and instruction complexities, our InfiGUI-G1 models establish new state-of-the-art results for open-source models in their respective size categories.
176
+
177
+ <div align="center">
178
+ <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_mmbench-gui.png" width="90%" alt="MMBench-GUI Results">
179
+ </div>
180
+
181
+ ### ScreenSpot-Pro Results
182
+
183
+ On the challenging ScreenSpot-Pro benchmark, designed to test semantic understanding on high-resolution professional software, InfiGUI-G1 demonstrates significant improvements, particularly on icon-based grounding tasks. This highlights AEPO's effectiveness in enhancing semantic alignment by associating abstract visual symbols with their functions.
184
+
185
+ <div align="center">
186
+ <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-pro.png" width="90%" alt="ScreenSpot-Pro Results">
187
+ </div>
188
+
189
+ ### UI-Vision (Element Grounding) Results
190
+
191
+ InfiGUI-G1 shows strong generalization capabilities on the UI-Vision benchmark, which is designed to test robustness across a wide variety of unseen desktop applications. Achieving high performance confirms that our AEPO framework fosters a robust understanding rather than overfitting to the training data.
192
+
193
+ <div align="center">
194
+ <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_ui-vision.png" width="90%" alt="UI-Vision Results">
195
+ </div>
196
+
197
+ ### UI-I2E-Bench Results
198
+
199
+ To further probe semantic reasoning, we evaluated on UI-I2E-Bench, a benchmark featuring a high proportion of implicit instructions that require reasoning beyond direct text matching. Our model's strong performance underscores AEPO's ability to handle complex, indirect commands.
200
+
201
+ <div align="center">
202
+ <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_i2e-bench.png" width="90%" alt="UI-I2E-Bench Results">
203
+ </div>
204
+
205
+ ### ScreenSpot-V2 Results
206
+
207
+ On the widely-used ScreenSpot-V2 benchmark, which provides comprehensive coverage across mobile, desktop, and web platforms, InfiGUI-G1 consistently outperforms strong baselines, demonstrating the broad applicability and data efficiency of our approach.
208
+
209
+ <div align="center">
210
+ <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
211
+ </div>
212
+
213
  ## Citation Information
214
 
215
  If you find this work useful, we would be grateful if you consider citing the following papers: