Efficient Multimodal Learning from Data-centric Perspective Paper • 2402.11530 • Published Feb 18, 2024 • 1
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper • 2406.10638 • Published Jun 15, 2024
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation Paper • 2502.11903 • Published Feb 17
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published Jun 5 • 5
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis Paper • 2508.13618 • Published Aug 19 • 18
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models Paper • 2504.03140 • Published Apr 4
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 20 days ago • 38
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 17 days ago • 17
AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation Paper • 2512.01334 • Published 25 days ago
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 17 days ago • 17
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 17 days ago • 17