Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Paper • 2506.04559 • Published 14 days ago • 2 • 1
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41 • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 41 • 13