LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published 5 days ago • 29
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published Jun 26, 2024 • 41