Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models Paper • 2505.17225 • Published 16 days ago • 64
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 4 items • Updated 9 days ago • 152
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face! By burtenshaw and 6 others • Apr 5 • 144
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published Mar 10 • 32
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 148
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 41
EXAONE-Deep Collection EXAONE reasoning model series of 2.4B, 7.8B, and 32B, optimized for reasoning tasks including math and coding • 9 items • Updated Mar 18 • 86
XiYanSQL Models Collection The XiYanSQL series are foundational SQL models available in various sizes, including 3B, 7B, 14B, and 32B. • 8 items • Updated 13 days ago • 7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated 20 days ago • 148
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published Sep 11, 2024 • 58
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs Paper • 2501.09825 • Published Jan 16 • 14
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 39