---
license: apache-2.0
datasets:
- Tongyi-Zhiwen/DocQA-RL-1.6K
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
tags:
- long-context
- large-reasoning-model
---
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
-----------------------------
[](https://opensource.org/licenses/Apache-2.0)
[](https://arxiv.org/abs/xxxx.xxxxx)
[](https://github.com/Tongyi-Zhiwen)
[](https://modelscope.cn/organization/iic/)
[](https://huggingface.co/Tongyi-Zhiwen)
_**Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan**_
_Qwen-Doc Team, Alibaba Group_
## 🎉 News
- **May 26, 2025:** 🔥 We release [🤗 QwenLong-L1-32B](https://huggingface.co/Tongyi-Zhiwen/QwenLong-L1-32B), which is the first long-context LRM trained with reinforcement learniing for long-context reasoning. Experiments on seven long-context DocQA benchmarks demonstrate that **QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking**, demonstrating leading performance among state-of-the-art LRMs.
- **May 26, 2025:** 🔥 We release [🤗 DocQA-RL-1.6K](https://huggingface.co/datasets/Tongyi-Zhiwen/DocQA-RL-1.6K), which is a specialized RL training dataset comprising 1.6K document question answering (DocQA) problems spanning mathematical, logical, and multi-hop reasoning domains.
## 📚 Introduction
In this work, we propose QwenLong-L1, a novel reinforcement learning (RL) framework designed to facilitate the transition of LRMs from short-context proficiency to robust long-context generalization. In our preliminary experiments, we illustrate the differences between the training dynamics of short-context and long-context reasoning RL.