ImgEdit: A Unified Image Editing Dataset and Benchmark Paper • 2505.20275 • Published 12 days ago • 17
Exploring the Latent Capacity of LLMs for One-Step Text Generation Paper • 2505.21189 • Published 12 days ago • 60
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 15 days ago • 63
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Paper • 2505.21327 • Published 11 days ago • 82
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 11 days ago • 93
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Paper • 2505.13426 • Published 19 days ago • 12
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Paper • 2505.17012 • Published 16 days ago • 12
Diffusion Classifiers Understand Compositionality, but Conditions Apply Paper • 2505.17955 • Published 16 days ago • 20
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 27 days ago • 420
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally Paper • 2502.03566 • Published Feb 5 • 2