Submitted by Yifan Zhang 8 On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning math-ai 60 2
Submitted by Yifan Zhang 3 Training and Evaluating Language Models with Template-based Data Generation math-ai 12 3
Submitted by Quanquan Gu 8 General Preference Modeling with Preference Representations for Aligning Language Models math-ai 37 4
Submitted by AK 17 AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts math-ai 89 2