Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation Paper • 2505.00612 • Published 23 days ago • 9
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 10 days ago • 100
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Paper • 2504.16891 • Published about 1 month ago • 21
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published Apr 22 • 64
view article Article Visualize and understand GPU memory in PyTorch By qgallouedec • Dec 24, 2024 • 223
Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated Mar 13 • 12
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 277
view article Article 🐺🐦⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram • Dec 4, 2024 • 79