CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 9 days ago • 87
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 39
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 39
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment Paper • 2304.06767 • Published Apr 13, 2023 • 2
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models Paper • 2306.12420 • Published Jun 21, 2023 • 2