Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Paper • 2506.17201 • Published 4 days ago • 32
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published 8 days ago • 42
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Paper • 2506.05344 • Published 19 days ago • 16
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Paper • 2505.05467 • Published May 8 • 13
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 173
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Paper • 2505.02835 • Published May 5 • 26
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 65
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published Apr 10 • 43
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 274
Synthetic Video Enhances Physical Fidelity in Video Synthesis Paper • 2503.20822 • Published Mar 26 • 16
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published Mar 27 • 34
EgoLife Collection CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 19
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 145
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published Feb 6 • 30