VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought Paper • 2505.16192 • Published 3 days ago • 6
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper • 2505.15952 • Published 3 days ago • 18
Understanding Generative AI Capabilities in Everyday Image Editing Tasks Paper • 2505.16181 • Published 3 days ago • 20
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Paper • 2505.14460 • Published 5 days ago • 30
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published 4 days ago • 116
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3 • 48
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20 • 175
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 44
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 213
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published Sep 4, 2024 • 55
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 131
VideoGameBunny: Towards vision assistants for video games Paper • 2407.15295 • Published Jul 21, 2024 • 22