Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30, 2024 • 24
Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping Paper • 2304.08025 • Published Apr 17, 2023 • 2
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models Paper • 2305.13655 • Published May 23, 2023 • 7