view article Article From GRPO to DAPO and GSPO: What, Why, and How By NormalUhr • 10 days ago • 11
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 46
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 68
view article Article How 🤗 Accelerate runs very large models thanks to PyTorch By sgugger • Sep 27, 2022 • 14