GenPRM
/

GenPRM-1.5B

Model card Files Files and versions Community

RyanLiu112 commited on about 24 hours ago

Commit

a0fa697

·

verified ·

1 Parent(s): 0617dab

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -4,13 +4,15 @@ datasets:
 - GenPRM/GenPRM-MATH-Data
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 ---
 # Introduction
 We propose **GenPRM**, a strong generative process reward model with the following features:
-- reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
 - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
 - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
 - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
@@ -106,4 +108,4 @@ Our recent work on LLM test-time scaling with PRMs:
     journal = {arXiv preprint arXiv:2502.06703},
     year    = {2025}
 }
-```

 - GenPRM/GenPRM-MATH-Data
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+language:
+- en
 ---
 # Introduction
 We propose **GenPRM**, a strong generative process reward model with the following features:
+- performing explicit **CoT reasoning** and **code verfication** before providing the process judgment;
 - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
 - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
 - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
     journal = {arXiv preprint arXiv:2502.06703},
     year    = {2025}
 }
+```