DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Paper • 2305.10429 • Published May 17, 2023 • 3