Llama3-8b Reward Model
This is the Llama3-8b-based Reward Model, trained using OpenRLHF, an efficient RLHF framework presented in the paper REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models.
The model was trained with a combination of datasets available at OpenLLMAI/preference_700K.
Base SFT model: OpenRLHF/Llama-3-8b-sft-mixture
Training Configuration
Cosine Scheduler
Learning Rate: 9e-6
Warmup Ratio: 0.03
Batch Size: 256
Epoch: 1
Usage
You can use this model with the Hugging Face transformers
library to score the quality of a generated response to a given prompt. The input format should match what the model was trained on (e.g., a full conversation turn using the Llama 3 chat template).
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "OpenRLHF/Llama-3-8b-rm-mixture" # This model ID
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Ensure to load with appropriate torch_dtype, e.g., torch.bfloat16 for Llama models
model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
# Example: Score responses to a user prompt
prompt = "Write a short poem about a cat."
response_good = "A feline friend, soft and sleek,\
Curled up warm, a purring peek.\
Through sunlit naps and playful chase,\
Graceful paws in every space."
response_bad = "Cats are okay. They sit sometimes. Dog is better."
# Apply the chat template for the full conversation turn (user prompt + assistant response)
# The `apply_chat_template` method structures the input as expected by the model.
messages_good = [
{"role": "user", "content": prompt},
{"role": "assistant", "content": response_good},
]
messages_bad = [
{"role": "user", "content": prompt},
{"role": "assistant", "content": response_bad},
]
input_ids_good = tokenizer.apply_chat_template(messages_good, return_tensors="pt", add_generation_prompt=False).to(model.device)
input_ids_bad = tokenizer.apply_chat_template(messages_bad, return_tensors="pt", add_generation_prompt=False).to(model.device)
# Get scores
with torch.no_grad():
score_good = model(input_ids_good).logits.item()
score_bad = model(input_ids_bad).logits.item()
print(f"Score for good response: {score_good:.2f}")
print(f"Score for bad response: {score_bad:.2f}")
- Downloads last month
- 271
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support