wizardII commited on
Commit
3dededa
·
verified ·
1 Parent(s): 2b2fae7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -1
README.md CHANGED
@@ -8,4 +8,87 @@ pipeline_tag: reinforcement-learning
8
  tags:
9
  - code
10
  new_version: wizardII/ArcherCodeR-1.5B
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  tags:
9
  - code
10
  new_version: wizardII/ArcherCodeR-1.5B
11
+ ---
12
+
13
+
14
+
15
+ <div align="center">
16
+
17
+ # ✨ ArcherCodeR
18
+
19
+ <div>
20
+ 🏹️ Reinforcement Learning for Smarter Code Reasoning in LLMs 🎯
21
+ </div>
22
+ </div>
23
+ <div>
24
+ <br>
25
+
26
+ <div align="center">
27
+
28
+ [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/wizard-III/ArcherCodeR)
29
+ [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
30
+ [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
31
+ [![Wandb](https://img.shields.io/badge/Wandb-000000?style=for-the-badge&logo=Wandb&logoColor=000&labelColor)](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
32
+
33
+ </div>
34
+
35
+ </div>
36
+
37
+
38
+ ## 📖 Overview
39
+
40
+ <div align="center">
41
+ <img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
42
+
43
+ <sub>ArcherCodeR-1.5B-DAPO’s LiveCodeBench (LCB) score improves steadily as training progresses and achieves a 27.24% LCB score—the best result among models of similar size (excluding our final ArcherCodeR-1.5B model).</sub>
44
+ </div>
45
+
46
+ **ArcherCodeR** is an open-source project focused on advancing code reasoning in LLMs through scalable, rule-based reinforcement learning. It offers full-stack reproducibility, including training code, datasets, models, and logs.
47
+
48
+ - **[`ArcherCodeR-1.5B`](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** achieves state-of-the-art performance on code tasks (LiveCodeBench) among models of similar size. We have currently released all training components of the ArcherCodeR-1.5B-DAPO model, with the final model code and training pipeline coming soon.
49
+
50
+
51
+ ## 📊 Evaluation
52
+
53
+ <div align="center">
54
+ <img src="./assets/figure_3.jpeg" width="75%"/>
55
+ <img src="./assets/figure_2.jpeg" width="75%"/>
56
+ </div>
57
+ </div>
58
+
59
+ ### ⚖️ Evaluation
60
+
61
+ | Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
62
+ | ------------------------------------- | ---------------- | --------------- |
63
+ | DeepSeek-R1-Distill-Qwen-1.5B | 16.9 | — |
64
+ | DeepSeek-R1-Distill-Qwen-1.5B | 16.40 | 25.81 |
65
+ | DeepCoder-1.5B | 25.1 | — |
66
+ | DeepCoder-1.5B | 23.03 | 30.82 |
67
+ | Nemotron-Research-Reasoning-Qwen-1.5B | 23.81 | — |
68
+ | Nemotron-Research-Reasoning-Qwen-1.5B | 25.45 | 34.40 |
69
+ | ArcherCodeR-1.5B-DAPO | 25.45 | 35.13 |
70
+ | ArcherCodeR-1.5B(32k) | 28.49 | 38.71 |
71
+ | **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
72
+
73
+
74
+ ## Technical Report
75
+ The technical report will be released soon.
76
+
77
+ ## Acknowledgements
78
+
79
+ - We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
80
+ - Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
81
+
82
+
83
+ ## Citation
84
+
85
+ Please cite the following:
86
+
87
+ ```bibtex
88
+ @misc{archercoder2025,
89
+ title={ArcherCodeR},
90
+ author={Jiakang Wang},
91
+ note={Blog},
92
+ year={2025}
93
+ }
94
+ ```