Safetensors
English
qwen2
RyanLiu112 commited on
Commit
0617dab
·
verified ·
1 Parent(s): a489541

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -9
README.md CHANGED
@@ -7,28 +7,41 @@ base_model:
7
  ---
8
 
9
  # Introduction
10
- We introduce GenPRM, a generative process reward model (PRM) designed to enhance process supervision performance through explicit Chain-of-Thought (CoT) reasoning and code verification. Addressing critical limitations of prior PRMs—including limited process supervision and scalability. GenPRM pioneers a novel paradigm that leverages the generative capabilities of LLMs to perform step-wise reasoning validation.
 
 
 
 
 
 
11
 
12
  GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
13
- - As a verifier: GenPRM-7B outperforms all classification-based PRMs of comparable size and even surpasses Qwen2.5-Math-PRM-72B via test-time scaling.
14
- - As a critic: GenPRM-7B demonstrates superior critique capabilities, achieving 3.4× greater performance gains than DeepSeekR1-Distill-Qwen-7B after 3 refinement iterations.
 
15
 
16
  ![](images/fig_head.png)
17
 
18
- - Project Page: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://ryanliu112.github.io/GenPRM/)
19
  - Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
20
  - Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
 
 
 
21
 
22
  # Model details
23
- For full training details, please refer to our [paper](https://arxiv.org/abs/2504.00891)
24
- - Training data: the 23K conversation data are released in [GenPRM-MATH-Data](https://huggingface.co/datasets/GenPRM/GenPRM-MATH-Data).
25
- - Base model: we select the [DeepSeek-R1-Distill series](https://huggingface.co/deepseek-ai) (1.5B, 7B, 32B) as our base models
26
 
 
 
 
 
27
 
28
  # How to use
29
- The evaluation and testing code for GenPRM are available in our GitHub repository: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
30
 
31
- Here's a minimal example of using VLLM for GenPRM rationale generation:
 
 
 
32
  ```python
33
  from transformers import AutoTokenizer
34
  from vllm import LLM, SamplingParams
 
7
  ---
8
 
9
  # Introduction
10
+
11
+ We propose **GenPRM**, a strong generative process reward model with the following features:
12
+
13
+ - reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
14
+ - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
15
+ - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
16
+ - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
17
 
18
  GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
19
+
20
+ - **As a verifier**: GenPRM-7B outperforms all classification-based PRMs of comparable size and even surpasses **Qwen2.5-Math-PRM-72B** via test-time scaling.
21
+ - **As a critic**: GenPRM-7B demonstrates superior critique capabilities, achieving **3.4×** greater performance gains than DeepSeekR1-Distill-Qwen-7B after 3 refinement iterations.
22
 
23
  ![](images/fig_head.png)
24
 
25
+ - Project Page: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://ryanliu112.github.io/GenPRM)
26
  - Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
27
  - Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
28
+ - Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
29
+ - HF Paper Link: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://hf.co/papers/2504.00891)
30
+ - HF Collection: [GenPRM](https://hf.co/collections/GenPRM/genprm-67ee4936234ba5dd16bb9943)
31
 
32
  # Model details
 
 
 
33
 
34
+ For full training details, please refer to our [paper](https://arxiv.org/abs/2504.00891).
35
+
36
+ - Training data: 23K SFT data is released in [GenPRM-MATH-Data](https://huggingface.co/datasets/GenPRM/GenPRM-MATH-Data).
37
+ - Base model: we use [DeepSeek-R1-Distill series](https://huggingface.co/deepseek-ai) (1.5B, 7B, and 32B) as our base models.
38
 
39
  # How to use
 
40
 
41
+ The evaluation code of GenPRM is available in our GitHub repository: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM).
42
+
43
+ Here's a minimal example of using GenPRM for rationale generation and process supervision:
44
+
45
  ```python
46
  from transformers import AutoTokenizer
47
  from vllm import LLM, SamplingParams