Model Card for Model ID

This a model is a chat model fine-tuned with RLHF using DeepSpeed Chat and LoRA. It is based on OPT1.3B.

Model Details

Model Description

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #3: Reinforcement Learning with Human Feedback

Downloads last month
190
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train kaitchup/OPT-1.3B-RLHF-DSChatLoRA