YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 DPO Fine-Tuned Model

This repository contains a fine-tuned GPT-2 model trained using Direct Preference Optimization (DPO) on preference-based data.

Model Details

  • Base Model: GPT-2
  • Fine-tuned on: Preference optimization dataset
  • Training Method: Direct Preference Optimization (DPO)
  • Hyperparameters:
    • Learning Rate: 1e-3
    • Batch Size: 8
    • Epochs: 5
    • Beta: 0.1

Dataset

The dataset used for training is Dahoas/static-hh, a publicly available dataset on Hugging Face, designed for human preference optimization. It consists of multiple prompts along with corresponding chosen and rejected responses.

Usage

Load the model and tokenizer from Hugging Face:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "PhuePwint/dpo_gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate response
prompt = "What is the purpose of life?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
0
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support