Reward model based deberta-v3-large-tasksource-nli
fine-tuned on Anthropic/hh-rlhf
For 1 epoch with 1e-5 learning rate.
The data are described in the paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for OpenAssistant/reward-model-deberta-v3-large-v2
).
- Downloads last month
- 428
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Dataset used to train sileod/deberta-v3-large-tasksource-rlhf-reward-model
Evaluation results
- accuracy on Anthropic/hh-rlhfvalidation set self-reported0,7516