Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers Paper • 2505.19439 • Published May 26 • 30
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27, 2024 • 55
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs Paper • 2402.12052 • Published Feb 19, 2024