We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published 5 days ago • 140
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published 5 days ago • 49
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated 4 days ago • 168
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published 6 days ago • 48
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper • 2508.09983 • Published 6 days ago • 61
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published 12 days ago • 112
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published 8 days ago • 102
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge Paper • 2505.23009 • Published May 29 • 18
TokensGen: Harnessing Condensed Tokens for Long Video Generation Paper • 2507.15728 • Published 29 days ago • 7
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild Paper • 2211.14758 • Published Nov 27, 2022 • 2
view article Article Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever. By MaziyarPanahi • Jul 16 • 133
PresentAgent: Multimodal Agent for Presentation Video Generation Paper • 2507.04036 • Published Jul 5 • 10
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1 • 45
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 53