Improve model card: Add pipeline tag and library name (#1)
Browse files- Improve model card: Add pipeline tag and library name (5a1ca879fcdc3b9123421f457a7f74a6253201e6)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,6 +1,9 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
4 |
### UI-Venus
|
5 |
This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
|
6 |
|
@@ -34,12 +37,8 @@ Key innovations include:
|
|
34 |
- **Efficient Data Cleaning**: Trained on several hundred thousand high-quality samples to ensure robustness.
|
35 |
- **Self-Evolving Trajectory History Alignment & Sparse Action Enhancement**: Improves reasoning coherence and action distribution for better long-horizon planning.
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
---
|
42 |
-
##
|
43 |
|
44 |
First, install the required dependencies:
|
45 |
|
@@ -48,9 +47,7 @@ pip install transformers==4.49.0 qwen-vl-utils
|
|
48 |
```
|
49 |
---
|
50 |
|
51 |
-
|
52 |
-
|
53 |
-
## Quick Start
|
54 |
```python
|
55 |
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
|
56 |
from typing import Dict, Tuple, Any
|
@@ -227,7 +224,6 @@ This is the compressed package of validation trajectories for **AndroidWorld**,
|
|
227 |
|
228 |
> **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
|
229 |
|
230 |
-
|
231 |
### Results on AndroidControl and GUI-Odyssey
|
232 |
|
233 |
| Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
|
@@ -253,7 +249,6 @@ This is the compressed package of validation trajectories for **AndroidWorld**,
|
|
253 |
|
254 |
> **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
|
255 |
|
256 |
-
|
257 |
# Citation
|
258 |
Please consider citing if you find our work useful:
|
259 |
```plain
|
@@ -266,4 +261,4 @@ Please consider citing if you find our work useful:
|
|
266 |
primaryClass={cs.CV},
|
267 |
url={https://arxiv.org/abs/2508.10833},
|
268 |
}
|
269 |
-
```
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
pipeline_tag: image-text-to-text
|
4 |
+
library_name: transformers
|
5 |
---
|
6 |
+
|
7 |
### UI-Venus
|
8 |
This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
|
9 |
|
|
|
37 |
- **Efficient Data Cleaning**: Trained on several hundred thousand high-quality samples to ensure robustness.
|
38 |
- **Self-Evolving Trajectory History Alignment & Sparse Action Enhancement**: Improves reasoning coherence and action distribution for better long-horizon planning.
|
39 |
|
|
|
|
|
|
|
|
|
40 |
---
|
41 |
+
## Installation
|
42 |
|
43 |
First, install the required dependencies:
|
44 |
|
|
|
47 |
```
|
48 |
---
|
49 |
|
50 |
+
## Quick Start
|
|
|
|
|
51 |
```python
|
52 |
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
|
53 |
from typing import Dict, Tuple, Any
|
|
|
224 |
|
225 |
> **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
|
226 |
|
|
|
227 |
### Results on AndroidControl and GUI-Odyssey
|
228 |
|
229 |
| Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
|
|
|
249 |
|
250 |
> **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
|
251 |
|
|
|
252 |
# Citation
|
253 |
Please consider citing if you find our work useful:
|
254 |
```plain
|
|
|
261 |
primaryClass={cs.CV},
|
262 |
url={https://arxiv.org/abs/2508.10833},
|
263 |
}
|
264 |
+
```
|