ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 42
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 39
CoCa: Contrastive Captioners are Image-Text Foundation Models Paper • 2205.01917 • Published May 4, 2022 • 3