Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning Paper • 2407.18248 • Published Jul 25, 2024 • 34
Learning Multi-Step Reasoning by Solving Arithmetic Tasks Paper • 2306.01707 • Published Jun 2, 2023 • 1