Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation Paper • 2503.12854 • Published Mar 17
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Paper • 2506.19767 • Published 2 days ago • 11