Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated 16 days ago • 76
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 27 days ago • 94
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published Feb 26 • 49
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published Feb 25 • 36
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Paper • 2502.15872 • Published Feb 21 • 5
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 26 days ago • 448
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published Jan 22 • 70
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution Paper • 2501.10045 • Published Jan 17 • 9
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published Jan 16 • 19
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published Jan 14 • 60
Visual Document Retrieval Collection A collection of models, datasets, and spaces in the VDR series • 5 items • Updated Jan 10 • 8
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10 • 66
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 46
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 49