GenPRM
/

GenPRM-1.5B

Safetensors

English

qwen2

Model card Files Files and versions Community

RyanLiu112 commited on 6 days ago

Commit

0617dab

verified ·

1 Parent(s): a489541

Update README.md

Browse files

Files changed (1) hide show

README.md +22 -9

README.md CHANGED Viewed

@@ -7,28 +7,41 @@ base_model:
 ---
 # Introduction
-We introduce GenPRM, a generative process reward model (PRM) designed to enhance process supervision performance through explicit Chain-of-Thought (CoT) reasoning and code verification. Addressing critical limitations of prior PRMs—including limited process supervision and scalability. GenPRM pioneers a novel paradigm that leverages the generative capabilities of LLMs to perform step-wise reasoning validation.
 GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
-- As a verifier: GenPRM-7B outperforms all classification-based PRMs of comparable size and even surpasses Qwen2.5-Math-PRM-72B via test-time scaling.
-- As a critic: GenPRM-7B demonstrates superior critique capabilities, achieving 3.4× greater performance gains than DeepSeekR1-Distill-Qwen-7B after 3 refinement iterations.
 ![](images/fig_head.png)
-- Project Page: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://ryanliu112.github.io/GenPRM/)
 - Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
 - Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
 # Model details
-For full training details, please refer to our [paper](https://arxiv.org/abs/2504.00891)
-- Training data: the 23K conversation data are released in [GenPRM-MATH-Data](https://huggingface.co/datasets/GenPRM/GenPRM-MATH-Data).
-- Base model: we select the [DeepSeek-R1-Distill series](https://huggingface.co/deepseek-ai) (1.5B, 7B, 32B) as our base models
 # How to use
-The evaluation and testing code for GenPRM are available in our GitHub repository:  [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
-Here's a minimal example of using VLLM for GenPRM rationale generation:
 ```python
 from transformers import AutoTokenizer
 from vllm import LLM, SamplingParams

 ---
 # Introduction
+We propose **GenPRM**, a strong generative process reward model with the following features:
+- reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
+- improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
+- supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
+- supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
 GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
+- **As a verifier**: GenPRM-7B outperforms all classification-based PRMs of comparable size and even surpasses **Qwen2.5-Math-PRM-72B** via test-time scaling.
+- **As a critic**: GenPRM-7B demonstrates superior critique capabilities, achieving **3.4×** greater performance gains than DeepSeekR1-Distill-Qwen-7B after 3 refinement iterations.
 ![](images/fig_head.png)
+- Project Page: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://ryanliu112.github.io/GenPRM)
 - Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
 - Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
+- Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
+- HF Paper Link: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://hf.co/papers/2504.00891)
+- HF Collection: [GenPRM](https://hf.co/collections/GenPRM/genprm-67ee4936234ba5dd16bb9943)
 # Model details
+For full training details, please refer to our [paper](https://arxiv.org/abs/2504.00891).
+- Training data: 23K SFT data is released in [GenPRM-MATH-Data](https://huggingface.co/datasets/GenPRM/GenPRM-MATH-Data).
+- Base model: we use [DeepSeek-R1-Distill series](https://huggingface.co/deepseek-ai) (1.5B, 7B, and 32B) as our base models.
 # How to use
+The evaluation code of GenPRM is available in our GitHub repository: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM).
+Here's a minimal example of using GenPRM for rationale generation and process supervision:
 ```python
 from transformers import AutoTokenizer
 from vllm import LLM, SamplingParams