Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Paper • 2507.06485 • Published Jul 9 • 4
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published Jul 8 • 45
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 86
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Paper • 2507.01945 • Published Jul 2 • 76
VideoDeepResearch: Long Video Understanding With Agentic Tool Using Paper • 2506.10821 • Published Jun 12 • 20
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published Jun 2 • 21
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 23
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
OpenResearcher: Unleashing AI for Accelerated Scientific Research Paper • 2408.06941 • Published Aug 13, 2024 • 33
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28, 2024 • 38
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper • 2408.10945 • Published Aug 20, 2024 • 11