VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 5 days ago • 70 • 3
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 5 days ago • 225 • 4
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published 6 days ago • 60 • 3
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published 6 days ago • 65 • 3
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 6 days ago • 20 • 2
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 13 days ago • 17 • 3
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published 6 days ago • 18 • 2
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 6 days ago • 26 • 4
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 8 days ago • 77 • 2
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 9 days ago • 22 • 2
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 6 days ago • 45 • 5
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution Paper • 2501.10045 • Published 11 days ago • 8 • 3
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions Paper • 2501.10020 • Published 11 days ago • 22 • 2
GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor Paper • 2501.09978 • Published 11 days ago • 6 • 2
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published 11 days ago • 38 • 10