Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Giuliano 's Collections
Agents 2.0
Multimodal
Voice
Video Gen
text2sql
Medicine
LLM Personalization
Agents
Agents SWE
Agents GUI
LLM Reasoning

Agents GUI

updated Feb 16
Upvote
-

  • ShowUI: One Vision-Language-Action Model for GUI Visual Agent

    Paper β€’ 2411.17465 β€’ Published Nov 26, 2024 β€’ 88

  • OmniParser for Pure Vision Based GUI Agent

    Paper β€’ 2408.00203 β€’ Published Aug 1, 2024 β€’ 26

  • Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

    Paper β€’ 2412.04454 β€’ Published Dec 5, 2024 β€’ 66

  • THUDM/cogagent-9b-20241220

    Image-Text-to-Text β€’ 14B β€’ Updated Dec 25, 2024 β€’ 305 β€’ 53

  • CogAgent: A Visual Language Model for GUI Agents

    Paper β€’ 2312.08914 β€’ Published Dec 14, 2023 β€’ 31

  • Running on Zero
    7
    7

    CogAgent Demo

    πŸƒ

    CogAgent-GUI-Demo


  • A3: Android Agent Arena for Mobile GUI Agents

    Paper β€’ 2501.01149 β€’ Published Jan 2 β€’ 22

  • xlangai/Aguvis-7B-720P

    8B β€’ Updated Jan 7 β€’ 1.37k β€’ 8

  • OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

    Paper β€’ 2412.19723 β€’ Published Dec 27, 2024 β€’ 88

  • Running
    33
    33

    UI-TARS

    πŸŒ–

    Generate click coordinates from image and instruction


  • microsoft/OmniParser-v2.0

    Updated Mar 28 β€’ 1.03k β€’ 1.27k
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs