Papers
arxiv:2406.13542

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Published on Jun 19
· Submitted by davanstrien on Jun 21

Abstract

One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness. Then, execution feedback-based rejection sampling can generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. AutoIF achieves significant improvements across three training algorithms, SFT, Offline DPO, and Online DPO, when applied to the top open-source LLMs, Qwen2 and LLaMA3, in self-alignment and strong-to-weak distillation settings. Our code is publicly available at https://github.com/QwenLM/AutoIF.

Community

Paper submitter

Nice to see the paper also comes with the code: https://github.com/QwenLM/AutoIF!

Paper author
•
edited Jun 28

Generate verifiable instruction following data with AutoIF! @Alibaba_Qwen. AutoIF validates instructions by following the generated code to check their correctness. In self-alignment and strong-to-weak distillation settings, it can improve models up to 15% and first achieve 90% Loose Instruction Acc on IFEval.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.13542 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.13542 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.13542 in a Space README.md to link it from this page.

Collections including this paper 4