Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Paper • 2408.10075 • Published Aug 19, 2024
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 65
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper • 2406.09279 • Published Jun 13, 2024 • 3
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 Paper • 2311.10702 • Published Nov 17, 2023 • 20
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources Paper • 2306.04751 • Published Jun 7, 2023 • 5
HINT: Hypernetwork Instruction Tuning for Efficient Zero-Shot Generalisation Paper • 2212.10315 • Published Dec 20, 2022 • 1
TESS: Text-to-Text Self-Conditioned Simplex Diffusion Paper • 2305.08379 • Published May 15, 2023 • 3