ganqu commited on
Commit
7a07ccf
·
verified ·
1 Parent(s): 04a8ad3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -11,6 +11,8 @@ license: apache-2.0
11
 
12
  ## Introduction
13
 
 
 
14
  Eurus-2-7B-PRIME is trained using **PRIME** (**P**rocess **R**einforcement through **IM**plicit r**E**ward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with [Eurus-2-7B-SFT](https://huggingface.co/PRIME-RL/Eurus-2-7B-SFT) and trains on [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data).
15
 
16
  <img src="./figures/prm.gif" alt="prm" style="zoom: 33%;" />
@@ -73,7 +75,6 @@ The final results are presented below:
73
  | OlympiadBench | 42.1 (+12.3) | 29.8 | 40.7 | 31.9 | **43.3** |
74
  | Avg. | **48.9 (+ 16.7)** | 32.2 | 43.8 | 36.4 | 43.3 |
75
 
76
- ![image-20241230162026156](./figures/performance.jpg)
77
 
78
  We achieved this with only 1/10 data and model resources compared with Qwen-Math.
79
 
 
11
 
12
  ## Introduction
13
 
14
+ ![image-20241230162026156](./figures/results.png)
15
+
16
  Eurus-2-7B-PRIME is trained using **PRIME** (**P**rocess **R**einforcement through **IM**plicit r**E**ward) method, an open-source solution for online reinforcement learning (RL) with process rewards, to advance reasoning abilities of language models beyond imitation or distillation. It starts with [Eurus-2-7B-SFT](https://huggingface.co/PRIME-RL/Eurus-2-7B-SFT) and trains on [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data).
17
 
18
  <img src="./figures/prm.gif" alt="prm" style="zoom: 33%;" />
 
75
  | OlympiadBench | 42.1 (+12.3) | 29.8 | 40.7 | 31.9 | **43.3** |
76
  | Avg. | **48.9 (+ 16.7)** | 32.2 | 43.8 | 36.4 | 43.3 |
77
 
 
78
 
79
  We achieved this with only 1/10 data and model resources compared with Qwen-Math.
80