--- license: llama3 --- # Quantile Regression for Distributional Reward Models in RLHF + **Author:** Nicolai Dorka + **Tech Report**: https://arxiv.org/abs/2409.10164 + **Code Repository:** https://github.com/Nicolinho/QRM + **Method Overview:** QRM generates a distribution over rewards by aggregating individual distributions over attribute scores like helpfulness and harmlessness.

image

This model uses [Skywork/Skywork-Reward-Llama-3.1-8B-v0.2](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B-v0.2) as backbone and used [Skywork/Skywork-Reward-Preference-80K-v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2) for training the gating network. Apart from this, it has been trained exactly as described in the tech report. ## Demo Code ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer device = "cuda" path = "nicolinho/QRM-Llama3.1-8B-v2" model = AutoModelForSequenceClassification.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True) # We load a random sample from the validation set of the HelpSteer dataset prompt = 'Does pineapple belong on a Pizza?' response = "There are different opinions on this. Some people like pineapple on a Pizza while others condemn this." messages = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response}] input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device) with torch.no_grad(): output = model(input_ids) # Expectation of the reward distribution reward = output.score.cpu().float() # Quantile estimates for the quantiles 0.05, 0.1, ..., 0.9, 0.95 representing the distribution over rewards reward_quantiles = output.reward_quantiles.cpu().float() # The attributes of the 5 reward objectives attributes = ['helpsteer-helpfulness','helpsteer-correctness','helpsteer-coherence', 'helpsteer-complexity','helpsteer-verbosity'] ``` ## Citation If you find this work useful for your research, please consider citing: ``` @article{dorka2024quantile, title={Quantile Regression for Distributional Reward Models in RLHF}, author={Dorka, Nicolai}, journal={arXiv preprint arXiv:2409.10164}, year={2024} } ```