Sekai: A Video Dataset towards World Exploration Paper • 2506.15675 • Published 22 days ago • 62
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 24 days ago • 252
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2 • 46
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper • 2505.17952 • Published May 23 • 21
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30 • 136
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
FLUX.1 Collection A collection of our FLUX.1 models and LoRAs. • 9 items • Updated 14 days ago • 146
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Feb 26 • 63
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation Paper • 2501.04144 • Published Jan 7 • 19
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Apr 28 • 220
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 405 items • Updated about 18 hours ago • 52