wizardII
/

ArcherCodeR-1.5B

 tags:
 - code
 new_version: wizardII/ArcherCodeR-1.5B
+---
+<div align="center">
+# ✨ ArcherCodeR
+<div>
+🏹️  Reinforcement Learning for Smarter Code Reasoning in LLMs  🎯
+</div>
+</div>
+<div>
+<br>
+<div align="center">
+[![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/wizard-III/ArcherCodeR)
+[![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
+[![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
+[![Wandb](https://img.shields.io/badge/Wandb-000000?style=for-the-badge&logo=Wandb&logoColor=000&labelColor)](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
+</div>
+</div>
+## 📖 Overview
+<div align="center">
+<img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
+<sub>ArcherCodeR-1.5B-DAPO’s LiveCodeBench (LCB) score improves steadily as training progresses and achieves a 27.24% LCB score—the best result among models of similar size (excluding our final ArcherCodeR-1.5B model).</sub>
+</div>
+**ArcherCodeR** is an open-source project focused on advancing code reasoning in LLMs through scalable, rule-based reinforcement learning. It offers full-stack reproducibility, including training code, datasets, models, and logs.
+- **[`ArcherCodeR-1.5B`](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** achieves state-of-the-art performance on code tasks (LiveCodeBench) among models of similar size. We have currently released all training components of the ArcherCodeR-1.5B-DAPO model, with the final model code and training pipeline coming soon.
+## 📊 Evaluation
+<div align="center">
+<img src="./assets/figure_3.jpeg" width="75%"/>
+<img src="./assets/figure_2.jpeg" width="75%"/>
+</div>
+</div>
+### ⚖️ Evaluation
+| Model                        | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
+| ------------------------------------- | ---------------- | --------------- |
+| DeepSeek-R1-Distill-Qwen-1.5B         | 16.9             | —               |
+| DeepSeek-R1-Distill-Qwen-1.5B         | 16.40            | 25.81           |
+| DeepCoder-1.5B                        | 25.1             | —               |
+| DeepCoder-1.5B                        | 23.03            | 30.82           |
+| Nemotron-Research-Reasoning-Qwen-1.5B | 23.81            | —               |
+| Nemotron-Research-Reasoning-Qwen-1.5B | 25.45            | 34.40           |
+| ArcherCodeR-1.5B-DAPO                 | 25.45            | 35.13           |
+| ArcherCodeR-1.5B(32k)                 | 28.49            | 38.71           |
+| **ArcherCodeR-1.5B(48k)**             | 29.30            | 39.07           |
+## Technical Report
+The technical report will be released soon.
+## Acknowledgements
+- We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
+- Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
+## Citation
+Please cite the following:
+```bibtex
+@misc{archercoder2025,
+  title={ArcherCodeR},
+  author={Jiakang Wang},
+  note={Blog},
+  year={2025}
+}
+```