PRIME-RL
/

Eurus-2-7B-PRIME

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ganqu commited on 11 days ago

Commit

7a07ccf

·

verified ·

1 Parent(s): 04a8ad3

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ license: apache-2.0
 ## Introduction
 Eurus-2-7B-PRIME is trained using **PRIME** (**P**rocess **R**einforcement through **IM**plicit r**E**ward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with [Eurus-2-7B-SFT](https://huggingface.co/PRIME-RL/Eurus-2-7B-SFT) and trains on [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data).
 <img src="./figures/prm.gif" alt="prm" style="zoom: 33%;" />
@@ -73,7 +75,6 @@ The final results are presented below:
 | OlympiadBench | 42.1 (+12.3)         | 29.8               | 40.7                          | 31.9                       | **43.3**   |
 | Avg.          | **48.9 (+ 16.7)**    | 32.2               | 43.8                          | 36.4                       | 43.3       |
-![image-20241230162026156](./figures/performance.jpg)
 We achieved this with only 1/10 data and model resources compared with Qwen-Math.

 ## Introduction
+![image-20241230162026156](./figures/results.png)
 Eurus-2-7B-PRIME is trained using **PRIME** (**P**rocess **R**einforcement through **IM**plicit r**E**ward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with [Eurus-2-7B-SFT](https://huggingface.co/PRIME-RL/Eurus-2-7B-SFT) and trains on [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data).
 <img src="./figures/prm.gif" alt="prm" style="zoom: 33%;" />
 | OlympiadBench | 42.1 (+12.3)         | 29.8               | 40.7                          | 31.9                       | **43.3**   |
 | Avg.          | **48.9 (+ 16.7)**    | 32.2               | 43.8                          | 36.4                       | 43.3       |
 We achieved this with only 1/10 data and model resources compared with Qwen-Math.