---
license: apache-2.0
---
# EurusPRM-Stage2
## Links
- 📜 [Blog]()
- 🤗 [PRIME Collection](https://huggingface.co/PRIME-RL)
- 🤗 [Training Data](https://huggingface.co/datasets/PRIME-RL/EurusPRM-Stage2-Data)
## Introduction
EurusPRM-Stage2 is trained using **[Implicit PRM](https://arxiv.org/abs/2412.01981)**, which obtains free process rewards at no additional cost but just needs to simply train an ORM on the cheaper response-level labels. During inference, implicit process rewards are obtained by forward passing and calculating the log-likelihood ratio on each step.
The key ingredient of Implicit PRM is the reward representation, as demonstrated below: