OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding Paper • 2507.07984 • Published 1 day ago • 32
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published 4 days ago • 88
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Paper • 2507.07136 • Published 3 days ago • 21
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published 1 day ago • 39
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Paper • 2507.07202 • Published 3 days ago • 19
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published 3 days ago • 40
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published 3 days ago • 46
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published 5 days ago • 40
How to Train Your LLM Web Agent: A Statistical Diagnosis Paper • 2507.04103 • Published 7 days ago • 44
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos Paper • 2507.05675 • Published 4 days ago • 25
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Paper • 2507.03112 • Published 9 days ago • 31
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published 4 days ago • 38
StreamDiT: Real-Time Streaming Text-to-Video Generation Paper • 2507.03745 • Published 8 days ago • 24
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published 6 days ago • 36
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published 4 days ago • 64