Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published 5 days ago • 10 • 1
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published 23 days ago • 57 • 2
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published Jul 25 • 18 • 2