RefDPO - a yale-nlp Collection

yale-nlp 's Collections

MMVU

updated Jul 19, 2024

Model and data collection for our work "Understanding Reference Policies in Direct Preference Optimization" (https://arxiv.org/abs/2407.13709)

Upvote

yale-nlp/RefDPO

Viewer • Updated Jul 18, 2024 • 312k • 48

Note Datasets
yale-nlp/tulu2-7b-dpo-beta-0.1

Text Generation • 7B • Updated Jul 18, 2024 • 1
yale-nlp/tulu2-7b-dpo-beta-0.02

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-beta-0.005

Text Generation • 7B • Updated Jul 18, 2024 • 1
yale-nlp/mistral-7b-dpo-beta-0.1

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-beta-0.05

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-beta-0.02

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-beta-0.01

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-beta-0.005

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-likelihood

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-probability

Text Generation • 7B • Updated Jul 18, 2024 • 1
yale-nlp/mistral-7b-dpo-mistralv2-7b-beta-10.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-mistralv2-7b-beta-1.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-mistralv2-7b-beta-0.1

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-mistralv2-7b-beta-0.01

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-mistralv2-7b-beta-0.005

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-llama3-70b-beta-10.0

Text Generation • 7B • Updated Jul 18, 2024 • 4
yale-nlp/mistral-7b-dpo-llama3-70b-beta-1.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-llama3-70b-beta-0.1

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/mistral-7b-dpo-llama3-70b-beta-0.01

Text Generation • 7B • Updated Jul 18, 2024 • 3
yale-nlp/mistral-7b-dpo-llama3-70b-beta-0.005

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-mistralv2-7b-beta-10.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-mistralv2-7b-beta-1.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-mistralv2-7b-beta-0.1

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-mistralv2-7b-beta-0.01

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-llama3-70b-beta-10.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-llama3-70b-beta-1.0

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-llama3-70b-beta-0.1

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-llama3-70b-beta-0.01

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-llama3-70b-beta-0.005

Text Generation • 7B • Updated Jul 18, 2024 • 2
yale-nlp/tulu2-7b-dpo-beta-0.05

Text Generation • 7B • Updated Jul 19, 2024 • 2
yale-nlp/tulu2-7b-dpo-beta-0.01

Text Generation • 7B • Updated Jul 19, 2024 • 2

Upvote