PhuePwint commited on
Commit
65c5fd6
·
verified ·
1 Parent(s): 275f586

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GPT-2 DPO Fine-Tuned Model
2
+
3
+ This repository contains a fine-tuned **GPT-2** model trained using **Direct Preference Optimization (DPO)** on preference-based data.
4
+
5
+ ## Model Details
6
+ - **Base Model:** GPT-2
7
+ - **Fine-tuned on:** Preference optimization dataset
8
+ - **Training Method:** Direct Preference Optimization (DPO)
9
+ - **Hyperparameters:**
10
+ - Learning Rate: `1e-3`
11
+ - Batch Size: `8`
12
+ - Epochs: `5`
13
+ - Beta: `0.1`
14
+
15
+ ## Dataset
16
+ The dataset used for training is **`Dahoas/static-hh`**, a publicly available dataset on Hugging Face, designed for **human preference optimization**. It consists of multiple prompts along with corresponding **chosen** and **rejected** responses.
17
+
18
+ ## Usage
19
+ Load the model and tokenizer from Hugging Face:
20
+ ```python
21
+ from transformers import AutoModelForCausalLM, AutoTokenizer
22
+
23
+ model_name = "PhuePwint/dpo_gpt2"
24
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
25
+ model = AutoModelForCausalLM.from_pretrained(model_name)
26
+
27
+ # Generate response
28
+ prompt = "What is the purpose of life?"
29
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
30
+ output = model.generate(input_ids, max_length=100)
31
+ print(tokenizer.decode(output[0], skip_special_tokens=True))