Running 3.74k The Ultra-Scale Playbook π 3.74k The ultimate guide to training LLM on large GPU Clusters
BEE-spoke-data/smol_llama-101M-GQA Text Generation β’ 0.1B β’ Updated Dec 29, 2025 β’ 1.96k β’ 32