R3 Models
Collection
Generate reward scores using generative models.
•
12 items
•
Updated
•
1
R3-Qwen3-8B-LoRA-4k is part of the R3 family, a series of Robust Rubric-Agnostic Reward Models. We perform SFT on the Qwen3 model family on the 4B, 8B, and 14B scales as well as on Phi-4-reasoning plus. Check out our paper for more information!
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
model_path = "rubricreward/R3-Qwen3-8B-LoRA-4k"
tokenizer = AutoTokenizer.from_pretrained(model_path)
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=8192, min_p=0, top_k=20)
llm = LLM(
model=model_path,
dtype="bfloat16",
max_model_len=10000,
tensor_parallel_size=2,
gpu_memory_utilization=0.9,
enforce_eager=True,
)
messages: list[dict[str, str]] = [
{'content': "Evaluate the response based on the given task, input, response, and evaluation rubric. Provide a fair and detailed assessment following the rubric...", 'role': 'user'}
]
list_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switch between thinking and non-thinking modes.
)
outputs = llm.generate(list_text, sampling_params)
R3 is licensed under the Apache 2.0 license.
@article{anugraha2025r3,
title={R3: Robust Rubric-Agnostic Reward Models},
author={Anugraha, David and Tang, Zilu and Miranda, Lester James V. and Zhao, Hanyang and Farhansyah, Mohammad Rifqi and Kuwanto, Garry and Wijaya, Derry and Winata, Genta Indra},
journal={arXiv preprint arXiv:2505.13388},
year={2025}
}