TRL documentation
Community Tutorials
Community Tutorials
Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.
Language Models
Tutorials
| Task | Class | Description | Author | Tutorial | Colab |
|---|---|---|---|---|---|
| Reinforcement Learning | GRPOTrainer | Efficient Online Training with GRPO and vLLM in TRL | Sergio Paniego | Link | |
| Reinforcement Learning | GRPOTrainer | Post training an LLM for reasoning with GRPO in TRL | Sergio Paniego | Link | |
| Reinforcement Learning | GRPOTrainer | Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial | Philipp Schmid | Link | |
| Reinforcement Learning | GRPOTrainer | RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations | Andrea Manzoni | Link | |
| Instruction tuning | SFTTrainer | Fine-tuning Google Gemma LLMs using ChatML format with QLoRA | Philipp Schmid | Link | |
| Structured Generation | SFTTrainer | Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT | Mohammadreza Esmaeilian | Link | |
| Preference Optimization | DPOTrainer | Align Mistral-7b using Direct Preference Optimization for human preference alignment | Maxime Labonne | Link | |
| Preference Optimization | ORPOTrainer | Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment | Maxime Labonne | Link | |
| Instruction tuning | SFTTrainer | How to fine-tune open LLMs in 2025 with Hugging Face | Philipp Schmid | Link |
Videos
| Task | Title | Author | Video |
|---|---|---|---|
| Instruction tuning | Fine-tuning open AI models using Hugging Face TRL | Wietse Venema | ![]() |
| Instruction tuning | How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset | Mayurji | ![]() |
⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)
The tutorial uses two deprecated features:
SFTTrainer(..., tokenizer=tokenizer): UseSFTTrainer(..., processing_class=tokenizer)instead, or simply omit it (it will be inferred from the model).setup_chat_format(model, tokenizer): UseSFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), wherechat_template_pathspecifies the model whose chat template you want to copy.
Vision Language Models
Tutorials
| Task | Class | Description | Author | Tutorial | Colab |
|---|---|---|---|---|---|
| Visual QA | SFTTrainer | Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset | Sergio Paniego | Link | |
| Visual QA | SFTTrainer | Fine-tuning SmolVLM with TRL on a consumer GPU | Sergio Paniego | Link | |
| SEO Description | SFTTrainer | Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images | Philipp Schmid | Link | |
| Visual QA | DPOTrainer | PaliGemma 🤝 Direct Preference Optimization | Merve Noyan | Link | |
| Visual QA | DPOTrainer | Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU | Sergio Paniego | Link | |
| Object Detection Grounding | SFTTrainer | Fine tuning a VLM for Object Detection Grounding using TRL | Sergio Paniego | Link | |
| Visual QA | DPOTrainer | Fine-Tuning a Vision Language Model with TRL using MPO | Sergio Paniego | Link | |
| Reinforcement Learning | GRPOTrainer | Post training a VLM for reasoning with GRPO using TRL | Sergio Paniego | Link |
Contributing
If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.
Update on GitHub
