Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published May 17 • 39 • 6