-
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 83 -
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 87 -
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Paper • 2410.18967 • Published • 1 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 51
Babu
BaMbUM
AI & ML interests
None yet
Recent Activity
updated
a collection
about 2 months ago
GUI Agents
upvoted
a
paper
about 2 months ago
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
upvoted
an
article
3 months ago
Open-R1: a fully open reproduction of DeepSeek-R1
Organizations
None yet
Collections
1
models
0
None public yet
datasets
0
None public yet