MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Paper • 2505.23764 • Published May 29 • 4
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings Paper • 2506.04997 • Published Jun 5
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 29
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs Paper • 2506.00991 • Published Jun 1
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published 2 days ago • 142
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering Paper • 2507.08776 • Published Jul 11 • 54
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20 • 63
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 69
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25 • 36
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 49
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 75
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7 • 66
OCSampler: Compressing Videos to One Clip with Single-step Sampling Paper • 2201.04388 • Published Jan 12, 2022
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement Paper • 2501.12273 • Published Jan 21 • 14