wizardII commited on
Commit
f25a52d
·
verified ·
1 Parent(s): 20bfd19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -25
README.md CHANGED
@@ -11,14 +11,14 @@ new_version: wizardII/ArcherCodeR-1.5B
11
  ---
12
 
13
 
14
-
15
  <div align="center">
16
 
17
  # ✨ ArcherCodeR
18
 
19
  <div>
20
- 🏹️ Reinforcement Learning for Smarter Code Reasoning in LLMs 🎯
21
  </div>
 
22
  </div>
23
  <div>
24
  <br>
@@ -29,26 +29,34 @@ new_version: wizardII/ArcherCodeR-1.5B
29
  [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
30
  [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
31
  [![Wandb](https://img.shields.io/badge/Wandb-000000?style=for-the-badge&logo=Wandb&logoColor=000&labelColor)](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
 
32
 
33
  </div>
34
 
35
- </div>
36
-
37
-
38
- ## 📖 Overview
39
 
40
  <div align="center">
41
  <img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
42
 
43
- <sub>ArcherCodeR-1.5B-DAPO’s LiveCodeBench (LCB) score improves steadily as training progresses and achieves a 27.24% LCB score.</sub>
44
  </div>
45
 
46
- **ArcherCodeR** is an open-source project focused on advancing code reasoning in LLMs through scalable, rule-based reinforcement learning. It offers full-stack reproducibility, including training code, datasets, models, and logs.
 
 
 
 
 
47
 
48
- - **[`ArcherCodeR-1.5B`](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** achieves state-of-the-art performance on code tasks (LiveCodeBench) among models of similar size. We have currently released all training components of the ArcherCodeR-1.5B-DAPO model, with the final model code and training pipeline coming soon.
 
 
49
 
 
50
 
51
- ### ⚖️ Evaluation
 
 
52
 
53
  | Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
54
  | --------------------------------------------- | --------------------------- | --------------------------- |
@@ -62,25 +70,15 @@ new_version: wizardII/ArcherCodeR-1.5B
62
  | **ArcherCodeR-1.5B(32k)** | 28.49 | 38.71 |
63
  | **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
64
 
 
 
 
 
65
 
66
  ## Technical Report
67
- The technical report will be released soon.
68
 
69
  ## Acknowledgements
70
 
71
  - We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
72
  - Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).
73
-
74
-
75
- ## Citation
76
-
77
- Please cite the following:
78
-
79
- ```bibtex
80
- @misc{archercoder2025,
81
- title={ArcherCodeR},
82
- author={Jiakang Wang},
83
- note={Blog},
84
- year={2025}
85
- }
86
- ```
 
11
  ---
12
 
13
 
 
14
  <div align="center">
15
 
16
  # ✨ ArcherCodeR
17
 
18
  <div>
19
+ 🏹️ Reinforcement Learning for Enhanced Code Reasoning in LLMs 🎯
20
  </div>
21
+
22
  </div>
23
  <div>
24
  <br>
 
29
  [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/wizardII/ArcherCodeR-1.5B)
30
  [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/wizardII/ArcherCodeR-Dataset)
31
  [![Wandb](https://img.shields.io/badge/Wandb-000000?style=for-the-badge&logo=Wandb&logoColor=000&labelColor)](https://wandb.ai/wangjkpkucs-peking-university/ArcherCodeR?nw=nwuserwangjkpkucs)
32
+ [![知乎](https://img.shields.io/badge/知乎-0084FF?style=for-the-badge&logo=zhihu&logoColor=white)](https://zhuanlan.zhihu.com/p/xxx)
33
 
34
  </div>
35
 
36
+ ## Overview
 
 
 
37
 
38
  <div align="center">
39
  <img src="assets/ArcherCodeR-1.5B-DAPO.png" width="100%"/>
40
 
41
+ <sub>ArcherCodeR-1.5B-DAPO achieves progressive improvements on LiveCodeBench (LCB), reaching 27.24% LCB score.</sub>
42
  </div>
43
 
44
+ **ArcherCodeR** is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning. We provide full-stack reproducibility including:
45
+
46
+ - Training code and pipelines
47
+ - Curated datasets
48
+ - Trained models
49
+ - Complete training logs
50
 
51
+ **Current Models**:
52
+ - **[ArcherCodeR-1.5B-DAPO](https://huggingface.co/wizardII/ArcherCodeR-1.5B-DAPO)** - achieves state-of-the-art performance on code tasks (LiveCodeBench) among comparable-scale models (excluding our final ArcherCodeR-1.5B). All training components for this model are now fully released.
53
+ - **[ArcherCodeR-1.5B](https://huggingface.co/wizardII/ArcherCodeR-1.5B)** - SOTA among similarly-sized models (training pipeline releasing progressively)
54
 
55
+ ## Evaluation
56
 
57
+ Performance on LiveCodeBench. The Pass@1 metric represents the average performance across 4 independent sampling attempts. To ensure consistency, we re-evaluated all comparable open-source models using identical evaluation scripts and parameters (temperature=0.8, max_gen_length=32k).
58
+
59
+ The detailed results are shown in the table below.
60
 
61
  | Model | LCB (8/1/24-2/1/25)(Pass@1) | LCB (8/1/24-2/1/25)(Pass@4) |
62
  | --------------------------------------------- | --------------------------- | --------------------------- |
 
70
  | **ArcherCodeR-1.5B(32k)** | 28.49 | 38.71 |
71
  | **ArcherCodeR-1.5B(48k)** | 29.30 | 39.07 |
72
 
73
+ Note:
74
+ 1. Evaluation variance for the same model is typically within ±0.5 across multiple runs.
75
+ 2. DeepCoder consistently scored around 23 in our tests - lower than its reported performance.
76
+ 3. NVIDIA's Nemotron-Research-Reasoning-Qwen-1.5B slightly outperformed its reported score, potentially due to different parameter settings in their original evaluation.
77
 
78
  ## Technical Report
79
+ Coming soon.
80
 
81
  ## Acknowledgements
82
 
83
  - We build our model upon [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
84
  - Training was carried out with a modified version of [verl](https://github.com/volcengine/verl).