A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Paper • 2507.07202 • Published 30 days ago • 22
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 210
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence Paper • 2506.15677 • Published Jun 18 • 24
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 108
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations Paper • 2504.07830 • Published Apr 10 • 18
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 88
Scaling Autonomous Agents via Automatic Reward Modeling And Planning Paper • 2502.12130 • Published Feb 17 • 2
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 123
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Paper • 2502.02584 • Published Feb 4 • 17
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 298
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 100
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21, 2024 • 15
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4, 2024 • 26
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 35
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 51
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published Oct 29, 2024 • 18