LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".

Highlights of LongPO
- Self-evolving long-context alignment without human/superior LLMs annotations.
- Extending context length while keeping aligned in one stage.
- No degradation on short-context capabilities.
Models and Training Data
* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
Evaluation
InfiniteBench
Model |
Train/Claimed Length |
En.Sum |
En.QA |
En.MC |
AVG. |
GPT-4-128K |
128K |
14.73 |
22.44 |
67.25 |
34.81 |
Qwen2-72B |
128K |
24.32ᵇ |
7.03ᵇ |
72.05ᵇ |
34.47ᵇ |
LLaMA 3.1-70B |
128K |
33.55ᵇ |
36.08ᵇ |
69.00ᵇ |
46.21ᵇ |
LLaMA 3.1-8B |
128K |
28.06ᵇ |
30.47ᵇ |
58.08ᵇ |
38.87ᵇ |
GLM-4-9B |
128K |
14.84ᵇ |
9.51ᵇ |
67.25ᵇ |
30.53ᵇ |
GLM-4-9B-1M |
1M |
28.3 |
9.7 |
68.6 |
35.53 |
LWM-7B-1M |
1M |
4.33ᵇ |
0.0ᵇ |
3.06ᵇ |
2.46ᵇ |
YaRN-Mistral-7B |
128K |
9.09 |
9.55 |
27.95 |
15.53 |
Mistral-7B |
32K |
22.13 |
4.93 |
14.41 |
13.82 |
- SFT |
128K |
23.44 |
13.45 |
53.21 |
30.03 |
- DPO |
128K |
15.21 |
10.34 |
48.14 |
25.56 |
- LongPO (iter1) |
128K |
27.05 |
23.51 |
67.25 |
39.27 |
- LongPO (iter2) |
256K |
28.16 |
24.43 |
66.35 |
39.65 |
- LongPO (iter3) |
512K |
29.10 |
27.85 |
66.67 |
41.21 |
Qwen2.5-7B |
128K |
22.89 |
6.08 |
52.4 |
27.12 |
- LongPO (iter1) |
128K |
32.06 |
17.32 |
72.05 |
40.48 |
- Our results are evaluated with greedy decoding.
- Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
RULER
Model |
NIAH |
VT |
AGG |
QA |
AVG (13 tasks) |
Qwen2.5-7B-Instruct |
82.10 |
80.09 |
74.50 |
54.30 |
76.50 |
Qwen2.5-7B-LongPO-128K |
95.82 |
89.71 |
78.67 |
59.40 |
87.11 |
Mistral-7B-Instruct-v0.2 |
72.60 |
74.40 |
64.40 |
52.20 |
68.40 |
Mistral-7B-LongPO-128K |
96.88 |
96.49 |
71.55 |
64.81 |
88.02 |
Mistral-7B-LongPO-256K-EXP |
96.80 |
97.00 |
69.14 |
64.87 |
87.65 |
Mistral-7B-LongPO-512K-EXP |
97.28 |
97.48 |
69.22 |
64.92 |
88.00 |
Short Context
Model |
MMLU |
ARC-C |
Hellaswag |
Winogrande |
Avg |
Mistral-7B-Instruct-v0.2 |
59.15 |
59.26 |
83.2 |
78.4 |
70.00 |
Mistral-7B-LongPO-128K |
59.99 |
59.34 |
82.99 |
78.53 |
70.21 |
Mistral-7B-LongPO-256K-EXP |
59.47 |
60.28 |
83.14 |
78.14 |
70.26 |
Mistral-7B-LongPO-512K-EXP |
59.51 |
60.58 |
82.87 |
77.66 |
70.16 |
Qwen2.5-7B-Instruct |
74.28 |
67.15 |
81.41 |
74.66 |
74.38 |
Qwen2.5-7B-LongPO-128K |
73.64 |
65.70 |
80.82 |
74.98 |
73.79 |
Citation
If you find our project useful, hope you can star our repo and cite our paper as follows:
@inproceedings{
chen2025longpo,
title={Long{PO}: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization},
author={Guanzheng Chen and Xin Li and Michael Shieh and Lidong Bing},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=qTrEq31Shm}
}