metadata
license: cc-by-nc-sa-4.0
datasets:
- Dahoas/rm-static
- Dahoas/synthetic-instruct-gptj-pairwise
- Anthropic/hh-rlhf
language:
- en
Model Card for Model ID
This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. It is based on OPT-350M.
Model Details
Model Description
- Developed by: The Kaitchup
- Model type: Reward model
- Language(s) (NLP): English
- License: cc-by-nc-sa-4.0
- Finetuned from model: facebook/opt-350m
Model Sources
The model has been trained with the procedure described in this article:
Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model