microsoft
/

GUI-Actor-3B-Qwen2.5-VL

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Update README.md

#1

by BustaHeroMax - opened Jun 11

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ library_name: transformers
 pipeline_tag: image-text-to-text
 ---
-# GUI-Actor-7B with Qwen2.5-VL-7B as backbone VLM
 This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
 It is developed based on [Qwen2.5-VL-3B-Instruct ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here](https://huggingface.co/datasets/cckevinn/GUI-Actor-Data).

 pipeline_tag: image-text-to-text
 ---
+# GUI-Actor-3B with Qwen2.5-VL-3B as backbone VLM
 This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
 It is developed based on [Qwen2.5-VL-3B-Instruct ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here](https://huggingface.co/datasets/cckevinn/GUI-Actor-Data).