view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 342
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons By NormalUhr • Feb 4 • 14
moonshotai/Kimi-VL-A3B-Thinking-2506 Image-Text-to-Text • 16B • Updated 1 day ago • 23.9k • 269
Running 3.08k 3.08k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 507
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 204