Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Paper • 2403.11487 • Published Mar 18, 2024 • 1
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published May 15 • 15
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 179
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 1 day ago • 52