SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 283
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 16
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 63
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods By kashif and 4 others • Jan 18, 2024 • 70
view article Article How to train a new language model from scratch using Transformers and Tokenizers By julien-c • Feb 14, 2020 • 44
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 319