On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 1 day ago • 77
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published 7 days ago • 173
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published 7 days ago • 173
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published 7 days ago • 173 • 7
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model Paper • 2404.10306 • Published Apr 16, 2024 • 1
Optimizing Language Model's Reasoning Abilities with Weak Supervision Paper • 2405.04086 • Published May 7, 2024 • 2
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning Paper • 2403.20046 • Published Mar 29, 2024
Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking Paper • 2310.12342 • Published Oct 18, 2023
SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents Paper • 2411.03284 • Published Nov 5, 2024
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Paper • 2505.18759 • Published May 24 • 12
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Paper • 2505.18759 • Published May 24 • 12 • 3
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21 • 49
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Paper • 2505.18759 • Published May 24 • 12