Update README.md
Browse files
README.md
CHANGED
@@ -14,22 +14,16 @@ pipeline_tag: image-text-to-text
|
|
14 |
</div>
|
15 |
|
16 |
## Overview
|
17 |
-
![os-
|
18 |
|
19 |
-
OS-Genesis
|
20 |
|
21 |
-
|
22 |
-
- [
|
23 |
-
- [OS-Atlas-Base-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Base-4B)
|
24 |
|
25 |
-
|
26 |
-
- [OS-Atlas-Pro-7B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-7B)
|
27 |
-
- [OS-Atlas-Pro-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-4B)
|
28 |
|
29 |
-
## Quick Start
|
30 |
-
OS-Atlas-Base-4B is a GUI grounding model finetuned from [InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B).
|
31 |
|
32 |
-
**Notes:** Our models accept images of any size as input. The model outputs are normalized to relative coordinates within a 0-1000 range (either a center point or a bounding box defined by top-left and bottom-right coordinates). For visualization, please remember to convert these relative coordinates back to the original image dimensions.
|
33 |
|
34 |
### Inference Example
|
35 |
First, install the `transformers` library:
|
@@ -135,21 +129,19 @@ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast
|
|
135 |
pixel_values = load_image('./web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png', max_num=6).to(torch.bfloat16).cuda()
|
136 |
generation_config = dict(max_new_tokens=1024, do_sample=True)
|
137 |
|
138 |
-
question = "
|
139 |
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
|
140 |
print(f'User: {question}\nAssistant: {response}')
|
141 |
```
|
142 |
|
143 |
|
144 |
-
|
145 |
-
|
146 |
## Citation
|
147 |
If you find this repository helpful, feel free to cite our paper:
|
148 |
```bibtex
|
149 |
-
@article{
|
150 |
-
|
151 |
-
|
152 |
-
|
153 |
-
|
154 |
-
|
155 |
```
|
|
|
14 |
</div>
|
15 |
|
16 |
## Overview
|
17 |
+
![os-genesis](https://cdn-uploads.huggingface.co/production/uploads/6064a0eeb1703ddba0d458b9/XvcAh92uvJQglmIu_L_nK.png)
|
18 |
|
19 |
+
We introduce OS-Genesis, an interaction-driven pipeline that synthesizes high-quality and diverse GUI agent trajectory data without human supervision. By leveraging reverse task synthesis, OS-Genesis enables effective training of GUI agents to achieve superior performance on dynamic benchmarks such as AndroidWorld and WebArena.
|
20 |
|
21 |
+
## Quick Start
|
22 |
+
OS-Genesis-8B-AC is a mobile action model finetuned from [InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B).
|
|
|
23 |
|
24 |
+
### Model Zoo
|
|
|
|
|
25 |
|
|
|
|
|
26 |
|
|
|
27 |
|
28 |
### Inference Example
|
29 |
First, install the `transformers` library:
|
|
|
129 |
pixel_values = load_image('./web_dfacd48d-d2c2-492f-b94c-41e6a34ea99f.png', max_num=6).to(torch.bfloat16).cuda()
|
130 |
generation_config = dict(max_new_tokens=1024, do_sample=True)
|
131 |
|
132 |
+
question = "<image> You are a GUI task expert, I will provide you with a high-level instruction, an action history, a screenshot with its corresponding accessibility tree.\n High-level instruction: {high_level_instruction}\n Action history: {action_history}\n Accessibility tree: {a11y_tree}\n Please generate the low-level thought and action for the next step."
|
133 |
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
|
134 |
print(f'User: {question}\nAssistant: {response}')
|
135 |
```
|
136 |
|
137 |
|
|
|
|
|
138 |
## Citation
|
139 |
If you find this repository helpful, feel free to cite our paper:
|
140 |
```bibtex
|
141 |
+
@article{sun2024osgenesis,
|
142 |
+
title={OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis},
|
143 |
+
author={Qiushi Sun and Kanzhi Cheng and Zichen Ding and Chuanyang Jin and Yian Wang and Fangzhi Xu and Zhenyu Wu and Chengyou Jia and Liheng Chen and Zhoumianze Liu and Ben Kao and Guohao Li and Junxian He and Yu Qiao and Zhiyong Wu},
|
144 |
+
journal={arXiv preprint arXiv:2412.19723},
|
145 |
+
year={2024}
|
146 |
+
}
|
147 |
```
|