Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Paper ⢠2506.04559 ⢠Published 14 days ago ⢠2
EMOVA-Datasets Collection A collection of EMOVA datasets (https://emova-ollm.github.io/) ⢠6 items ⢠Updated Mar 14 ⢠2
EMOVA-Models Collection A collection of EMOVA models (https://emova-ollm.github.io/) ⢠11 items ⢠Updated Mar 14 ⢠3
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper ⢠2411.13807 ⢠Published Nov 21, 2024 ⢠11
GeoDiffusion Collection A collection of GeoDiffusion checkpoints (https://kaichen1998.github.io/projects/geodiffusion/) ⢠11 items ⢠Updated Dec 5, 2024 ⢠2
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper ⢠2409.18042 ⢠Published Sep 26, 2024 ⢠41