We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published 5 days ago • 140
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published 5 days ago • 49
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated 4 days ago • 167
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published 6 days ago • 48
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper • 2508.09983 • Published 6 days ago • 61
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published 12 days ago • 112
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published 8 days ago • 102
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge Paper • 2505.23009 • Published May 29 • 18
bosonai/higgs-audio-v2-generation-3B-base Text-to-Speech • 6B • Updated 22 days ago • 309k • 564
TokensGen: Harnessing Condensed Tokens for Long Video Generation Paper • 2507.15728 • Published 29 days ago • 7