arxiv:2407.15762

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Published on Jul 22

· Submitted by

akhaliq on Jul 23

Upvote

Authors:

Kaiwen Wang ,

Andrea Michi ,

Yunxuan Li ,

Alexandre Ramé ,

Johan Ferret ,

Le Hou ,

Léonard Hussenot ,

Olivier Bachem ,

Edouard Leurent

Abstract

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP can learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through an extensive set of experiments and ablations, we show that the CLP framework learns steerable models that outperform and Pareto-dominate the current state-of-the-art approaches for multi-objective finetuning.

View arXiv page View PDF Add to collection