Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper • 2505.17952 • Published 6 days ago • 18
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30 • 133
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published Feb 26 • 63
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 144
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation Paper • 2501.04144 • Published Jan 7 • 19
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Apr 28 • 216
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 370 items • Updated 1 day ago • 41
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published Dec 11, 2024 • 46
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 37
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23