File size: 4,436 Bytes
f4f8c10 2b2fae7 1eb818c 3dededa f25a52d 3dededa f25a52d 3dededa 313b88c 3dededa f25a52d 3dededa f25a52d 3dededa f25a52d 3dededa f25a52d 3dededa f25a52d 3dededa f25a52d 3dededa 9439b51 3dededa f25a52d 3dededa f25a52d 3dededa 1eb818c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: mit
datasets:
- wizardII/ArcherCodeR-Dataset
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: reinforcement-learning
tags:
- code
new_version: wizardII/ArcherCodeR-1.5B
language:
- en
---
<div align="center">
# ✨ ArcherCodeR
<div>
🏹️ Reinforcement Learning for Enhanced Code Reasoning in LLMs 🎯
</div>
</div>
<div>
<br>
<div align="center">
[](https://github.com/wizard-III/ArcherCodeR)
[](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
[](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
[](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
[](https://zhuanlan.zhihu.com/p/1918765619614057424)
</div>
## Overview
<div align="center">
<img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
<sub>ArcherCodeR-1.5B-DAPO achieves progressive improvements on LiveCodeBench (LCB), reaching 27.24% LCB score.</sub>
</div>
**ArcherCodeR** is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning. We provide full-stack reproducibility including:
- Training code and pipelines
- Curated datasets
- Trained models
- Complete training logs
**Current Models**:
- **[ArcherCodeR-1.5B-DAPO](https://huggingface.co/wizardII/ArcherCodeR-1.5B-DAPO)** - achieves state-of-the-art performance on code tasks (LiveCodeBench) among comparable-scale models (excluding our final ArcherCodeR-1.5B). All training components for this model are now fully released.
- **[ArcherCodeR-1.5B](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** - SOTA among similarly-sized models (training pipeline releasing progressively)
## Evaluation
Performance on LiveCodeBench. The Pass@1 metric represents the average performance across 4 independent sampling attempts. To ensure consistency, we re-evaluated all comparable open-source models using identical evaluation scripts and parameters (temperature=0.8, max_gen_length=32k).
The detailed results are shown in the table below.
| Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
| --------------------------------------------- | --------------------------- | --------------------------- |
| DeepSeek-R1-Distill-Qwen-1.5B | 16.9 | — |
| DeepSeek-R1-Distill-Qwen-1.5B(Tested) | 16.40 | 25.81 |
| DeepCoder-1.5B | 25.1 | — |
| DeepCoder-1.5B(Tested) | 23.03 | 30.82 |
| Nemotron-Research-Reasoning-Qwen-1.5B | 23.81 | — |
| Nemotron-Research-Reasoning-Qwen-1.5B(Tested) | 25.45 | 34.40 |
| **ArcherCodeR-1.5B-DAPO** | 26.70 | 36.56 |
| **ArcherCodeR-1.5B(32k)** | 28.49 | 38.71 |
| **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
Note:
1. Evaluation variance for the same model is typically within ±0.5 across multiple runs.
2. DeepCoder consistently scored around 23 in our tests - lower than its reported performance.
3. NVIDIA's Nemotron-Research-Reasoning-Qwen-1.5B slightly outperformed its reported score, potentially due to different parameter settings in their original evaluation.
## Technical Report
Coming soon.
## Acknowledgements
- We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
- Training was carried out with a modified version of [verl](https://github.com/volcengine/verl). |