PhuePwint
/

dpo_gpt2

Model card Files Files and versions Community

dpo_gpt2 / README.md

PhuePwint's picture

Create README.md

65c5fd6 verified 3 months ago

|

history blame contribute delete

1.19 kB

	# GPT-2 DPO Fine-Tuned Model

	This repository contains a fine-tuned GPT-2 model trained using Direct Preference Optimization (DPO) on preference-based data.

	## Model Details
	- Base Model: GPT-2
	- Fine-tuned on: Preference optimization dataset
	- Training Method: Direct Preference Optimization (DPO)
	- Hyperparameters:
	- Learning Rate: `1e-3`
	- Batch Size: `8`
	- Epochs: `5`
	- Beta: `0.1`

	## Dataset
	The dataset used for training is `Dahoas/static-hh`, a publicly available dataset on Hugging Face, designed for human preference optimization. It consists of multiple prompts along with corresponding chosen and rejected responses.

	## Usage
	Load the model and tokenizer from Hugging Face:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "PhuePwint/dpo_gpt2"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# Generate response
	prompt = "What is the purpose of life?"
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids
	output = model.generate(input_ids, max_length=100)
	print(tokenizer.decode(output[0], skip_special_tokens=True))