DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 48
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Paper • 2410.02749 • Published Oct 3 • 12
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 62
Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process Paper • 2405.11870 • Published May 20
LoRA Dropout as a Sparsity Regularizer for Overfitting Control Paper • 2404.09610 • Published Apr 15 • 1
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning Paper • 2402.13669 • Published Feb 21 • 1
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published 13 days ago • 57