Safetensors
English
qwen2
RyanLiu112 commited on
Commit
a0fa697
·
verified ·
1 Parent(s): 0617dab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -4,13 +4,15 @@ datasets:
4
  - GenPRM/GenPRM-MATH-Data
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
 
7
  ---
8
 
9
  # Introduction
10
 
11
  We propose **GenPRM**, a strong generative process reward model with the following features:
12
 
13
- - reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
14
  - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
15
  - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
16
  - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
@@ -106,4 +108,4 @@ Our recent work on LLM test-time scaling with PRMs:
106
  journal = {arXiv preprint arXiv:2502.06703},
107
  year = {2025}
108
  }
109
- ```
 
4
  - GenPRM/GenPRM-MATH-Data
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
7
+ language:
8
+ - en
9
  ---
10
 
11
  # Introduction
12
 
13
  We propose **GenPRM**, a strong generative process reward model with the following features:
14
 
15
+ - performing explicit **CoT reasoning** and **code verfication** before providing the process judgment;
16
  - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
17
  - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
18
  - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
 
108
  journal = {arXiv preprint arXiv:2502.06703},
109
  year = {2025}
110
  }
111
+ ```