Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,8 @@ license: apache-2.0
|
|
11 |
|
12 |
## Introduction
|
13 |
|
|
|
|
|
14 |
Eurus-2-7B-PRIME is trained using **PRIME** (**P**rocess **R**einforcement through **IM**plicit r**E**ward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with [Eurus-2-7B-SFT](https://huggingface.co/PRIME-RL/Eurus-2-7B-SFT) and trains on [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data).
|
15 |
|
16 |
<img src="./figures/prm.gif" alt="prm" style="zoom: 33%;" />
|
@@ -73,7 +75,6 @@ The final results are presented below:
|
|
73 |
| OlympiadBench | 42.1 (+12.3) | 29.8 | 40.7 | 31.9 | **43.3** |
|
74 |
| Avg. | **48.9 (+ 16.7)** | 32.2 | 43.8 | 36.4 | 43.3 |
|
75 |
|
76 |
-
![image-20241230162026156](./figures/performance.jpg)
|
77 |
|
78 |
We achieved this with only 1/10 data and model resources compared with Qwen-Math.
|
79 |
|
|
|
11 |
|
12 |
## Introduction
|
13 |
|
14 |
+
![image-20241230162026156](./figures/results.png)
|
15 |
+
|
16 |
Eurus-2-7B-PRIME is trained using **PRIME** (**P**rocess **R**einforcement through **IM**plicit r**E**ward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with [Eurus-2-7B-SFT](https://huggingface.co/PRIME-RL/Eurus-2-7B-SFT) and trains on [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data).
|
17 |
|
18 |
<img src="./figures/prm.gif" alt="prm" style="zoom: 33%;" />
|
|
|
75 |
| OlympiadBench | 42.1 (+12.3) | 29.8 | 40.7 | 31.9 | **43.3** |
|
76 |
| Avg. | **48.9 (+ 16.7)** | 32.2 | 43.8 | 36.4 | 43.3 |
|
77 |
|
|
|
78 |
|
79 |
We achieved this with only 1/10 data and model resources compared with Qwen-Math.
|
80 |
|