SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 87 • 9
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper • 2410.15316 • Published Oct 20, 2024 • 12 • 5
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Paper • 2208.12242 • Published Aug 25, 2022 • 12 • 12
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 37 • 6
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 37 • 6
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper • 2401.04575 • Published Jan 9, 2024 • 17 • 4
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 33 • 3
Guiding a Diffusion Model with a Bad Version of Itself Paper • 2406.02507 • Published Jun 4, 2024 • 17 • 1
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper • 2406.02523 • Published Jun 4, 2024 • 12 • 1
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published Jun 4, 2024 • 11 • 2
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published Jun 4, 2024 • 18 • 3