Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 2 days ago • 40
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 4 days ago • 49
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published Mar 18 • 18
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published Mar 18 • 46
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 31
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 4 days ago • 40