# GPT-2 DPO Fine-Tuned Model This repository contains a fine-tuned **GPT-2** model trained using **Direct Preference Optimization (DPO)** on preference-based data. ## Model Details - **Base Model:** GPT-2 - **Fine-tuned on:** Preference optimization dataset - **Training Method:** Direct Preference Optimization (DPO) - **Hyperparameters:** - Learning Rate: `1e-3` - Batch Size: `8` - Epochs: `5` - Beta: `0.1` ## Dataset The dataset used for training is **`Dahoas/static-hh`**, a publicly available dataset on Hugging Face, designed for **human preference optimization**. It consists of multiple prompts along with corresponding **chosen** and **rejected** responses. ## Usage Load the model and tokenizer from Hugging Face: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "PhuePwint/dpo_gpt2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Generate response prompt = "What is the purpose of life?" input_ids = tokenizer(prompt, return_tensors="pt").input_ids output = model.generate(input_ids, max_length=100) print(tokenizer.decode(output[0], skip_special_tokens=True))