--- license: apache-2.0 --- # EurusPRM-Stage2 ## Links - 📜 [Blog]() - 🤗 [PRIME Collection](https://huggingface.co/PRIME-RL) - 🤗 [Training Data](https://huggingface.co/datasets/PRIME-RL/EurusPRM-Stage2-Data) ## Introduction EurusPRM-Stage2 is trained using **[Implicit PRM](https://arxiv.org/abs/2412.01981)**, which obtains free process rewards at no additional cost but just needs to simply train an ORM on the cheaper response-level labels. During inference, implicit process rewards are obtained by forward passing and calculating the log-likelihood ratio on each step. prm

The key ingredient of Implicit PRM is the reward representation, as demonstrated below: