|
# GPT-2 DPO Fine-Tuned Model |
|
|
|
This repository contains a fine-tuned **GPT-2** model trained using **Direct Preference Optimization (DPO)** on preference-based data. |
|
|
|
## Model Details |
|
- **Base Model:** GPT-2 |
|
- **Fine-tuned on:** Preference optimization dataset |
|
- **Training Method:** Direct Preference Optimization (DPO) |
|
- **Hyperparameters:** |
|
- Learning Rate: `1e-3` |
|
- Batch Size: `8` |
|
- Epochs: `5` |
|
- Beta: `0.1` |
|
|
|
## Dataset |
|
The dataset used for training is **`Dahoas/static-hh`**, a publicly available dataset on Hugging Face, designed for **human preference optimization**. It consists of multiple prompts along with corresponding **chosen** and **rejected** responses. |
|
|
|
## Usage |
|
Load the model and tokenizer from Hugging Face: |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "PhuePwint/dpo_gpt2" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
# Generate response |
|
prompt = "What is the purpose of life?" |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
output = model.generate(input_ids, max_length=100) |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
|