GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 50
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13 • 17
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 9 items • Updated 8 days ago • 117
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Paper • 2503.10437 • Published Mar 13 • 32
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 45
DreamReward: Text-to-3D Generation with Human Preference Paper • 2403.14613 • Published Mar 21, 2024 • 38
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published Jun 14, 2024 • 34
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 60
YOLOv10 Collection This collection hosts the YOLOv10 model releases • 16 items • Updated Jun 3, 2024 • 18
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects Paper • 2403.15382 • Published Mar 22, 2024 • 11
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 128