Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification Paper • 2412.00876 • Published Dec 1, 2024
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9 • 29
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 15 days ago • 45
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 15 days ago • 45
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 15 days ago • 45
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 15 days ago • 45
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 15 days ago • 45
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published 15 days ago • 45
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published Feb 25 • 20
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 105
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 99
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published Nov 28, 2024 • 34
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 38
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29, 2024 • 44