Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 2 days ago • 13
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Paper • 2504.13143 • Published 7 days ago • 8
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published 14 days ago • 46
A Unified Agentic Framework for Evaluating Conditional Image Generation Paper • 2504.07046 • Published 15 days ago • 30
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published 15 days ago • 10
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated 14 days ago • 76
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 57
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 40
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Paper • 2412.01064 • Published Dec 2, 2024 • 30
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19, 2024 • 44
CosmicMan: A Text-to-Image Foundation Model for Humans Paper • 2404.01294 • Published Apr 1, 2024 • 16
Magic-Me: Identity-Specific Video Customized Diffusion Paper • 2402.09368 • Published Feb 14, 2024 • 30
EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model Paper • 2401.08049 • Published Jan 16, 2024 • 3
Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering Paper • 2402.00827 • Published Feb 1, 2024 • 2
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27, 2024 • 195
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13, 2024 • 37