On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published May 23 • 5
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 31
Scaling Image Tokenizers with Grouped Spherical Quantization Paper • 2412.02632 • Published Dec 3, 2024 • 10
Training and Evaluating Language Models with Template-based Data Generation Paper • 2411.18104 • Published Nov 27, 2024 • 3
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
DPLM-2: A Multimodal Diffusion Protein Language Model Paper • 2410.13782 • Published Oct 17, 2024 • 22
general-preference/GPO-Llama-3-8B-Instruct-GPM-2B Text Generation • 8B • Updated Oct 11, 2024 • 13 • 2
general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B Text Generation • 8B • Updated Oct 11, 2024 • 16 • 1
General Preference Modeling with Preference Representations for Aligning Language Models Paper • 2410.02197 • Published Oct 3, 2024 • 9
General Preference Modeling with Preference Representations for Aligning Language Models Paper • 2410.02197 • Published Oct 3, 2024 • 9
ProteinBench: A Holistic Evaluation of Protein Foundation Models Paper • 2409.06744 • Published Sep 10, 2024 • 8