Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 3 days ago • 30
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning Paper • 2503.16081 • Published 15 days ago • 25
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 7 days ago • 21
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published 7 days ago • 67
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published 9 days ago • 31
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 9 days ago • 44
CoLLM: A Large Language Model for Composed Image Retrieval Paper • 2503.19910 • Published 9 days ago • 11
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published 17 days ago • 16
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Paper • 2503.19855 • Published 9 days ago • 24
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published 10 days ago • 29
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning Paper • 2503.18406 • Published 11 days ago • 3
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models Paper • 2503.18923 • Published 10 days ago • 11
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published 12 days ago • 18
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published 14 days ago • 27