Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper
•
2305.18290
•
Published
•
48
Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696