Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Paper • 2403.11487 • Published Mar 18, 2024 • 1
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published about 1 month ago • 15
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 176
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 3 days ago • 51