Update README.md
Browse files
README.md
CHANGED
@@ -8,4 +8,87 @@ pipeline_tag: reinforcement-learning
|
|
8 |
tags:
|
9 |
- code
|
10 |
new_version: wizardII/ArcherCodeR-1.5B
|
11 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
tags:
|
9 |
- code
|
10 |
new_version: wizardII/ArcherCodeR-1.5B
|
11 |
+
---
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
<div align="center">
|
16 |
+
|
17 |
+
# ✨ ArcherCodeR
|
18 |
+
|
19 |
+
<div>
|
20 |
+
🏹️ Reinforcement Learning for Smarter Code Reasoning in LLMs 🎯
|
21 |
+
</div>
|
22 |
+
</div>
|
23 |
+
<div>
|
24 |
+
<br>
|
25 |
+
|
26 |
+
<div align="center">
|
27 |
+
|
28 |
+
[](https://github.com/wizard-III/ArcherCodeR)
|
29 |
+
[](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
|
30 |
+
[](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
|
31 |
+
[](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
|
32 |
+
|
33 |
+
</div>
|
34 |
+
|
35 |
+
</div>
|
36 |
+
|
37 |
+
|
38 |
+
## 📖 Overview
|
39 |
+
|
40 |
+
<div align="center">
|
41 |
+
<img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
|
42 |
+
|
43 |
+
<sub>ArcherCodeR-1.5B-DAPO’s LiveCodeBench (LCB) score improves steadily as training progresses and achieves a 27.24% LCB score—the best result among models of similar size (excluding our final ArcherCodeR-1.5B model).</sub>
|
44 |
+
</div>
|
45 |
+
|
46 |
+
**ArcherCodeR** is an open-source project focused on advancing code reasoning in LLMs through scalable, rule-based reinforcement learning. It offers full-stack reproducibility, including training code, datasets, models, and logs.
|
47 |
+
|
48 |
+
- **[`ArcherCodeR-1.5B`](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** achieves state-of-the-art performance on code tasks (LiveCodeBench) among models of similar size. We have currently released all training components of the ArcherCodeR-1.5B-DAPO model, with the final model code and training pipeline coming soon.
|
49 |
+
|
50 |
+
|
51 |
+
## 📊 Evaluation
|
52 |
+
|
53 |
+
<div align="center">
|
54 |
+
<img src="./assets/figure_3.jpeg" width="75%"/>
|
55 |
+
<img src="./assets/figure_2.jpeg" width="75%"/>
|
56 |
+
</div>
|
57 |
+
</div>
|
58 |
+
|
59 |
+
### ⚖️ Evaluation
|
60 |
+
|
61 |
+
| Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
|
62 |
+
| ------------------------------------- | ---------------- | --------------- |
|
63 |
+
| DeepSeek-R1-Distill-Qwen-1.5B | 16.9 | — |
|
64 |
+
| DeepSeek-R1-Distill-Qwen-1.5B | 16.40 | 25.81 |
|
65 |
+
| DeepCoder-1.5B | 25.1 | — |
|
66 |
+
| DeepCoder-1.5B | 23.03 | 30.82 |
|
67 |
+
| Nemotron-Research-Reasoning-Qwen-1.5B | 23.81 | — |
|
68 |
+
| Nemotron-Research-Reasoning-Qwen-1.5B | 25.45 | 34.40 |
|
69 |
+
| ArcherCodeR-1.5B-DAPO | 25.45 | 35.13 |
|
70 |
+
| ArcherCodeR-1.5B(32k) | 28.49 | 38.71 |
|
71 |
+
| **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
|
72 |
+
|
73 |
+
|
74 |
+
## Technical Report
|
75 |
+
The technical report will be released soon.
|
76 |
+
|
77 |
+
## Acknowledgements
|
78 |
+
|
79 |
+
- We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
|
80 |
+
- Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
|
81 |
+
|
82 |
+
|
83 |
+
## Citation
|
84 |
+
|
85 |
+
Please cite the following:
|
86 |
+
|
87 |
+
```bibtex
|
88 |
+
@misc{archercoder2025,
|
89 |
+
title={ArcherCodeR},
|
90 |
+
author={Jiakang Wang},
|
91 |
+
note={Blog},
|
92 |
+
year={2025}
|
93 |
+
}
|
94 |
+
```
|