Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published 3 days ago • 42
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 3 days ago • 26
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 3 days ago • 60
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 5 days ago • 72
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 3 days ago • 52
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper • 2503.24388 • Published 3 days ago • 24
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published 3 days ago • 42
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 5 days ago • 65
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 7 days ago • 41
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published 10 days ago • 66
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation Paper • 2503.22675 • Published 6 days ago • 31
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published 9 days ago • 39
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published 10 days ago • 47
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published 10 days ago • 70
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published 14 days ago • 70
A Comprehensive Survey on Long Context Language Modeling Paper • 2503.17407 • Published 14 days ago • 47