TinyLlama-1.1B-DPO-Human-Like
library_name: transformers model_name: model tags: - generated_from_trainer - trl - dpo licence: apache-2.0
Model Card for model
This model is a DPO trained version of TinyLlama/TinyLlama-1.1B-Chat-v1.0. It has been trained using TRL.
Quick start
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="Josh1207/TinyLlama-1.1B-DPO-HumanLike", torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...
Training procedure
loss_type="sigmoid",
learning_rate=1e-5,
beta=0.3,
epochs=2,
batch_size=2,
gradient_accumulation_steps=4,
This model was trained with DPO, a method introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
Framework versions
- TRL: 0.19.1
- Transformers: 4.53.1
- Pytorch: 2.7.1
- Datasets: 3.6.0
- Tokenizers: 0.21.4.dev0
Citations
Cite DPO as:
@inproceedings{rafailov2023direct,
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
year = 2023,
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Josh1207/TinyLlama-1.1B-DPO-HumanLike
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0