zengw nielsr HF Staff commited on
Commit
053d65c
·
verified ·
1 Parent(s): 908b9b5

Improve model card: Add pipeline tag and library name (#1)

Browse files

- Improve model card: Add pipeline tag and library name (5a1ca879fcdc3b9123421f457a7f74a6253201e6)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +6 -11
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
4
  ### UI-Venus
5
  This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
6
 
@@ -34,12 +37,8 @@ Key innovations include:
34
  - **Efficient Data Cleaning**: Trained on several hundred thousand high-quality samples to ensure robustness.
35
  - **Self-Evolving Trajectory History Alignment & Sparse Action Enhancement**: Improves reasoning coherence and action distribution for better long-horizon planning.
36
 
37
-
38
-
39
-
40
-
41
  ---
42
- ## Installation
43
 
44
  First, install the required dependencies:
45
 
@@ -48,9 +47,7 @@ pip install transformers==4.49.0 qwen-vl-utils
48
  ```
49
  ---
50
 
51
-
52
-
53
- ## Quick Start
54
  ```python
55
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
56
  from typing import Dict, Tuple, Any
@@ -227,7 +224,6 @@ This is the compressed package of validation trajectories for **AndroidWorld**,
227
 
228
  > **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
229
 
230
-
231
  ### Results on AndroidControl and GUI-Odyssey
232
 
233
  | Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
@@ -253,7 +249,6 @@ This is the compressed package of validation trajectories for **AndroidWorld**,
253
 
254
  > **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
255
 
256
-
257
  # Citation
258
  Please consider citing if you find our work useful:
259
  ```plain
@@ -266,4 +261,4 @@ Please consider citing if you find our work useful:
266
  primaryClass={cs.CV},
267
  url={https://arxiv.org/abs/2508.10833},
268
  }
269
- ```
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
  ---
6
+
7
  ### UI-Venus
8
  This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
9
 
 
37
  - **Efficient Data Cleaning**: Trained on several hundred thousand high-quality samples to ensure robustness.
38
  - **Self-Evolving Trajectory History Alignment & Sparse Action Enhancement**: Improves reasoning coherence and action distribution for better long-horizon planning.
39
 
 
 
 
 
40
  ---
41
+ ## Installation
42
 
43
  First, install the required dependencies:
44
 
 
47
  ```
48
  ---
49
 
50
+ ## Quick Start
 
 
51
  ```python
52
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
53
  from typing import Dict, Tuple, Any
 
224
 
225
  > **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
226
 
 
227
  ### Results on AndroidControl and GUI-Odyssey
228
 
229
  | Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
 
249
 
250
  > **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
251
 
 
252
  # Citation
253
  Please consider citing if you find our work useful:
254
  ```plain
 
261
  primaryClass={cs.CV},
262
  url={https://arxiv.org/abs/2508.10833},
263
  }
264
+ ```