Update README.md
Browse files
README.md
CHANGED
@@ -11,14 +11,14 @@ new_version: wizardII/ArcherCodeR-1.5B
|
|
11 |
---
|
12 |
|
13 |
|
14 |
-
|
15 |
<div align="center">
|
16 |
|
17 |
# ✨ ArcherCodeR
|
18 |
|
19 |
<div>
|
20 |
-
🏹️ Reinforcement Learning for
|
21 |
</div>
|
|
|
22 |
</div>
|
23 |
<div>
|
24 |
<br>
|
@@ -29,26 +29,34 @@ new_version: wizardII/ArcherCodeR-1.5B
|
|
29 |
[](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
|
30 |
[](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
|
31 |
[](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
|
|
|
32 |
|
33 |
</div>
|
34 |
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
## 📖 Overview
|
39 |
|
40 |
<div align="center">
|
41 |
<img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
|
42 |
|
43 |
-
<sub>ArcherCodeR-1.5B-DAPO
|
44 |
</div>
|
45 |
|
46 |
-
**ArcherCodeR** is an open-source
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
-
|
|
|
|
|
49 |
|
|
|
50 |
|
51 |
-
|
|
|
|
|
52 |
|
53 |
| Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
|
54 |
| --------------------------------------------- | --------------------------- | --------------------------- |
|
@@ -62,25 +70,15 @@ new_version: wizardII/ArcherCodeR-1.5B
|
|
62 |
| **ArcherCodeR-1.5B(32k)** | 28.49 | 38.71 |
|
63 |
| **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
|
64 |
|
|
|
|
|
|
|
|
|
65 |
|
66 |
## Technical Report
|
67 |
-
|
68 |
|
69 |
## Acknowledgements
|
70 |
|
71 |
- We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
|
72 |
- Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
|
73 |
-
|
74 |
-
|
75 |
-
## Citation
|
76 |
-
|
77 |
-
Please cite the following:
|
78 |
-
|
79 |
-
```bibtex
|
80 |
-
@misc{archercoder2025,
|
81 |
-
title={ArcherCodeR},
|
82 |
-
author={Jiakang Wang},
|
83 |
-
note={Blog},
|
84 |
-
year={2025}
|
85 |
-
}
|
86 |
-
```
|
|
|
11 |
---
|
12 |
|
13 |
|
|
|
14 |
<div align="center">
|
15 |
|
16 |
# ✨ ArcherCodeR
|
17 |
|
18 |
<div>
|
19 |
+
🏹️ Reinforcement Learning for Enhanced Code Reasoning in LLMs 🎯
|
20 |
</div>
|
21 |
+
|
22 |
</div>
|
23 |
<div>
|
24 |
<br>
|
|
|
29 |
[](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
|
30 |
[](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
|
31 |
[](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
|
32 |
+
[](https://zhuanlan.zhihu.com/p/xxx)
|
33 |
|
34 |
</div>
|
35 |
|
36 |
+
## Overview
|
|
|
|
|
|
|
37 |
|
38 |
<div align="center">
|
39 |
<img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
|
40 |
|
41 |
+
<sub>ArcherCodeR-1.5B-DAPO achieves progressive improvements on LiveCodeBench (LCB), reaching 27.24% LCB score.</sub>
|
42 |
</div>
|
43 |
|
44 |
+
**ArcherCodeR** is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning. We provide full-stack reproducibility including:
|
45 |
+
|
46 |
+
- Training code and pipelines
|
47 |
+
- Curated datasets
|
48 |
+
- Trained models
|
49 |
+
- Complete training logs
|
50 |
|
51 |
+
**Current Models**:
|
52 |
+
- **[ArcherCodeR-1.5B-DAPO](https://huggingface.co/wizardII/ArcherCodeR-1.5B-DAPO)** - achieves state-of-the-art performance on code tasks (LiveCodeBench) among comparable-scale models (excluding our final ArcherCodeR-1.5B). All training components for this model are now fully released.
|
53 |
+
- **[ArcherCodeR-1.5B](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** - SOTA among similarly-sized models (training pipeline releasing progressively)
|
54 |
|
55 |
+
## Evaluation
|
56 |
|
57 |
+
Performance on LiveCodeBench. The Pass@1 metric represents the average performance across 4 independent sampling attempts. To ensure consistency, we re-evaluated all comparable open-source models using identical evaluation scripts and parameters (temperature=0.8, max_gen_length=32k).
|
58 |
+
|
59 |
+
The detailed results are shown in the table below.
|
60 |
|
61 |
| Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
|
62 |
| --------------------------------------------- | --------------------------- | --------------------------- |
|
|
|
70 |
| **ArcherCodeR-1.5B(32k)** | 28.49 | 38.71 |
|
71 |
| **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
|
72 |
|
73 |
+
Note:
|
74 |
+
1. Evaluation variance for the same model is typically within ±0.5 across multiple runs.
|
75 |
+
2. DeepCoder consistently scored around 23 in our tests - lower than its reported performance.
|
76 |
+
3. NVIDIA's Nemotron-Research-Reasoning-Qwen-1.5B slightly outperformed its reported score, potentially due to different parameter settings in their original evaluation.
|
77 |
|
78 |
## Technical Report
|
79 |
+
Coming soon.
|
80 |
|
81 |
## Acknowledgements
|
82 |
|
83 |
- We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
|
84 |
- Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|