view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 13 days ago • 102
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 276
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face! By burtenshaw and 6 others • Apr 5 • 144
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! By medmekk and 1 other • Mar 7 • 59
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.25k
Running 2.62k 2.62k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Running 2.62k 2.62k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 861
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper • 2409.15241 • Published Sep 23, 2024 • 1