wizardII
/

ArcherCodeR-1.5B

Reinforcement Learning

Safetensors

English

qwen2

code

Model card Files Files and versions Community

wizardII commited on 7 days ago

Commit

f25a52d

verified ·

1 Parent(s): 20bfd19

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -25

README.md CHANGED Viewed

@@ -11,14 +11,14 @@ new_version: wizardII/ArcherCodeR-1.5B
 ---
 <div align="center">
 # ✨ ArcherCodeR
 <div>
-🏹️  Reinforcement Learning for Smarter Code Reasoning in LLMs  🎯
 </div>
 </div>
 <div>
 <br>
@@ -29,26 +29,34 @@ new_version: wizardII/ArcherCodeR-1.5B
 [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
 [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
 [![Wandb](https://img.shields.io/badge/Wandb-000000?style=for-the-badge&logo=Wandb&logoColor=000&labelColor)](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
 </div>
-</div>
-## 📖 Overview
 <div align="center">
 <img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
-<sub>ArcherCodeR-1.5B-DAPO’s LiveCodeBench (LCB) score improves steadily as training progresses and achieves a 27.24% LCB score.</sub>
 </div>
-**ArcherCodeR** is an open-source project focused on advancing code reasoning in LLMs through scalable, rule-based reinforcement learning. It offers full-stack reproducibility, including training code, datasets, models, and logs.
-- **[`ArcherCodeR-1.5B`](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** achieves state-of-the-art performance on code tasks (LiveCodeBench) among models of similar size. We have currently released all training components of the ArcherCodeR-1.5B-DAPO model, with the final model code and training pipeline coming soon.
-### ⚖️ Evaluation
 | Model                                         | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
 | --------------------------------------------- | --------------------------- | --------------------------- |
@@ -62,25 +70,15 @@ new_version: wizardII/ArcherCodeR-1.5B
 | **ArcherCodeR-1.5B(32k)**                     | 28.49                       | 38.71                       |
 | **ArcherCodeR-1.5B(48k)**                     | 29.30                       | 39.07                       |
 ## Technical Report
-The technical report will be released soon.
 ## Acknowledgements
 - We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
 - Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
-## Citation
-Please cite the following:
-```bibtex
-@misc{archercoder2025,
-  title={ArcherCodeR},
-  author={Jiakang Wang},
-  note={Blog},
-  year={2025}
-}
-```

 ---
 <div align="center">
 # ✨ ArcherCodeR
 <div>
+🏹️  Reinforcement Learning for Enhanced Code Reasoning in LLMs  🎯
 </div>
 </div>
 <div>
 <br>
 [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
 [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
 [![Wandb](https://img.shields.io/badge/Wandb-000000?style=for-the-badge&logo=Wandb&logoColor=000&labelColor)](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
+[![知乎](https://img.shields.io/badge/知乎-0084FF?style=for-the-badge&logo=zhihu&logoColor=white)](https://zhuanlan.zhihu.com/p/xxx)
 </div>
+## Overview
 <div align="center">
 <img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
+<sub>ArcherCodeR-1.5B-DAPO achieves progressive improvements on LiveCodeBench (LCB), reaching 27.24% LCB score.</sub>
 </div>
+**ArcherCodeR** is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning. We provide full-stack reproducibility including:
+- Training code and pipelines
+- Curated datasets
+- Trained models
+- Complete training logs
+**Current Models**:
+- **[ArcherCodeR-1.5B-DAPO](https://huggingface.co/wizardII/ArcherCodeR-1.5B-DAPO)** - achieves state-of-the-art performance on code tasks (LiveCodeBench) among comparable-scale models (excluding our final ArcherCodeR-1.5B). All training components for this model are now fully released.
+- **[ArcherCodeR-1.5B](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** - SOTA among similarly-sized models (training pipeline releasing progressively)
+## Evaluation
+Performance on LiveCodeBench. The Pass@1 metric represents the average performance across 4 independent sampling attempts. To ensure consistency, we re-evaluated all comparable open-source models using identical evaluation scripts and parameters (temperature=0.8, max_gen_length=32k).
+The detailed results are shown in the table below.
 | Model                                         | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
 | --------------------------------------------- | --------------------------- | --------------------------- |
 | **ArcherCodeR-1.5B(32k)**                     | 28.49                       | 38.71                       |
 | **ArcherCodeR-1.5B(48k)**                     | 29.30                       | 39.07                       |
+Note:
+1. Evaluation variance for the same model is typically within ±0.5 across multiple runs.
+2. DeepCoder consistently scored around 23 in our tests - lower than its reported performance.
+3. NVIDIA's Nemotron-Research-Reasoning-Qwen-1.5B slightly outperformed its reported score, potentially due to different parameter settings in their original evaluation.
 ## Technical Report
+Coming soon.
 ## Acknowledgements
 - We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
 - Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).