Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published Mar 14 • 89
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 222
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published 28 days ago • 63
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 25 days ago • 71
Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation Paper • 2604.19141 • Published Apr 21 • 1
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published about 1 month ago • 240
DiffEM: Learning from Corrupted Data with Diffusion Models via Expectation Maximization Paper • 2510.12691 • Published Dec 20, 2025 • 1
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning Paper • 2603.24458 • Published Mar 25 • 9