--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation --- # EurusPRM-Stage2 ## Links - 📜 [Blog](https://curvy-check-498.notion.site/Process-Reinforcement-through-Implicit-Rewards-15f4fcb9c42180f1b498cc9b2eaf896f) - 🤗 [PRIME Collection](https://huggingface.co/PRIME-RL) - 🤗 [Training Data](https://huggingface.co/datasets/PRIME-RL/EurusPRM-Stage1-Data) ## Introduction EurusPRM-Stage1 is trained using **[Implicit PRM](https://arxiv.org/abs/2412.01981)**, which obtains free process rewards at no additional cost but just needs to simply train an ORM on the cheaper response-level labels. During inference, implicit process rewards are obtained by forward passing and calculating the log-likelihood ratio on each step. It serves a great fundation for further training of **[EurusPRM-Stage2](https://huggingface.co/PRIME-RL/EurusPRM-Stage2)**. prm The key ingredient of Implicit PRM is the reward representation, as demonstrated below: