Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Paper • 2411.16579 • Published Nov 25, 2024 • 2
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF Paper • 2411.01798 • Published Nov 4, 2024 • 8
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 77
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2, 2024 • 24
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Paper • 2410.05229 • Published Oct 7, 2024 • 22
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19, 2024 • 22