yale-nlp 's Collections

RefDPO

Model and data collection for our work "Understanding Reference Policies in Direct Preference Optimization" (https://arxiv.org/abs/2407.13709)