view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 30 days ago • 611
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published 30 days ago • 38
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.28k
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 280
view article Article Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers? By Kseniase and 1 other • Apr 4 • 14
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 197
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference By mfuntowicz and 1 other • Jan 16 • 75