arxiv:2412.04628

Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts

Published on Dec 5, 2024

Authors:

Taneesh Gupta ,

Abstract

Multi-Preference Optimization generalizes Direct Preference Optimization by extending the Bradley-Terry model to groupwise comparisons and achieves state-of-the-art performance in preference-based alignment.

AI-generated summary

Direct Preference Optimization (DPO) has become a popular approach for aligning language models using pairwise preferences. However, in practical post-training pipelines, on-policy generation typically yields multiple candidate responses per prompt, which are scored by a reward model to guide learning. In this setting, we propose Multi-Preference Optimization (MPO), a generalization of DPO that optimizes over entire sets of responses by extending the Bradley-Terry model to groupwise comparisons between chosen and rejected sets. To further enhance learning, MPO employs deviation-based weighting, which emphasizes outlier responses that deviate most from the mean reward, effectively inducing a self-paced curriculum. We theoretically prove that MPO reduces alignment bias at a rate of Oleft(1{n}right) with respect to the number of responses per query. Empirically, MPO achieves state-of-the-art performance on the UltraFeedback benchmark and yields up to sim 17.5% improvement over the state-of-the-art baseline in length-controlled win rate on AlpacaEval2, establishing a new baseline for preference-based alignment

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.04628 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.04628 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.04628 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.