Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning โข 7 items โข Updated 19 days ago โข 54