RLPR
Collection
Extrapolating RLVR to General Domains without Verifiers
β’
6 items
β’
Updated
β’
2
RLPR-Gemma2-2B-it is trained from Gemma2-2B-it with the RLPR framework, which eliminates reliance on external verifiers and is simple and generalizable for more domains.
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("openbmb/RLPR-Gemma2-2B-it")
model = AutoModelForCausalLM.from_pretrained(
"openbmb/RLPR-Gemma2-2B-it",
device_map="auto",
torch_dtype=torch.bfloat16,
)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))
If you find our model/code/paper helpful, please consider citing our papers π:
@misc{yu2025rlprextrapolatingrlvrgeneral,
title={RLPR: Extrapolating RLVR to General Domains without Verifiers},
author={Tianyu Yu and Bo Ji and Shouli Wang and Shu Yao and Zefan Wang and Ganqu Cui and Lifan Yuan and Ning Ding and Yuan Yao and Zhiyuan Liu and Maosong Sun and Tat-Seng Chua},
year={2025},
eprint={2506.18254},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.18254},
}