LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning Paper • 2206.06522 • Published Jun 13, 2022
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Paper • 2504.08641 • Published 15 days ago • 7
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting Paper • 2504.15485 • Published 5 days ago • 5
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems Paper • 2504.09763 • Published 13 days ago • 13
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published Nov 7, 2024 • 30
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation Paper • 2304.06671 • Published Apr 13, 2023
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback Paper • 2410.06215 • Published Oct 8, 2024
Self-Chained Image-Language Model for Video Localization and Question Answering Paper • 2305.06988 • Published May 11, 2023
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models Paper • 2202.04053 • Published Feb 8, 2022
Visual Programming for Text-to-Image Generation and Evaluation Paper • 2305.15328 • Published May 24, 2023
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks Paper • 2112.06825 • Published Dec 13, 2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer Paper • 2107.02681 • Published Jul 6, 2021
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents Paper • 2403.12014 • Published Mar 18, 2024
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation Paper • 2310.18235 • Published Oct 27, 2023
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data Paper • 2403.06952 • Published Mar 11, 2024
DOCCI: Descriptions of Connected and Contrasting Images Paper • 2404.19753 • Published Apr 30, 2024 • 13