view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 13 days ago • 102
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 276
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face! By burtenshaw and 6 others • Apr 5 • 144
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! By medmekk and 1 other • Mar 7 • 59
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.25k
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 861
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper • 2409.15241 • Published Sep 23, 2024 • 1
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 257
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 21
view article Article How NuminaMath Won the 1st AIMO Progress Prize By yfleureau and 7 others • Jul 11, 2024 • 120
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper • 2201.02177 • Published Jan 6, 2022 • 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? By neuralink and 2 others • Aug 14, 2024 • 63
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30, 2024 • 6
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 162