GPT-2 DPO Fine-Tuned Model
This repository contains a fine-tuned GPT-2 model trained using Direct Preference Optimization (DPO) on preference-based data.
Model Details
- Base Model: GPT-2
- Fine-tuned on: Preference optimization dataset
- Training Method: Direct Preference Optimization (DPO)
- Hyperparameters:
- Learning Rate:
1e-3
- Batch Size:
8
- Epochs:
5
- Beta:
0.1
- Learning Rate:
Dataset
The dataset used for training is Dahoas/static-hh
, a publicly available dataset on Hugging Face, designed for human preference optimization. It consists of multiple prompts along with corresponding chosen and rejected responses.
Usage
Load the model and tokenizer from Hugging Face:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "PhuePwint/dpo_gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate response
prompt = "What is the purpose of life?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))