agents-course (Hugging Face Agents Course)

So you can now SFT a model with hf jobs + TRL in ONE command lol 🏎️💨

Without worrying about infrastructure since it runs entirely on HF!

docs: https://huggingface.co/docs/huggingface_hub/main/en/guides/jobs
blog: https://huggingface.co/blog/hf-cli

sergiopaniego

posted an update 14 days ago

Post

357

New Zero-Shot Object Detectors in transformers! 🥽

We’ve added LLMDet and MM GroundingDINO, plus a demo Space to compare them with others 🖼️

Play with it: ariG23498/zero-shot-od

sergiopaniego

posted an update 15 days ago

Post

328

Missed last week's OpenAI GPT OSS release?

Here are 2 quick-start recipes we developed to get you up to speed:

🏃‍♀️ How to run gpt-oss-20b on Google Colab
https://cookbook.openai.com/articles/gpt-oss/run-colab

🧑‍🔧 Fine-tuning with gpt-oss and Hugging Face Transformers
https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers

sergiopaniego

posted an update 19 days ago

Post

418

Latest TRL release brings major upgrades for multimodal alignment!

We dive into 3 new techniques to improve VLM post-training in our new blog:

🌋 GRPO
🎞️ GSPO
🐙 MPO
➕ vLLM integration for online training w/ transformers backend\

🐡 Blog: https://huggingface.co/blog/trl-vlm-alignment

sergiopaniego

posted an update 20 days ago

Post

2157

OpenAI's open models are out! 💃

Try: https://www.gpt-oss.com/
Learn: https://huggingface.co/blog/welcome-openai-gpt-oss

1 reply

·

sergiopaniego

posted an update 21 days ago

Post

3381

Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋

🧑‍🍳 We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋

sergiopaniego

posted an update 22 days ago

Post

4485

Just included example scripts for aligning models using GSPO (including VLM example) 🙆‍♂️🙆‍♂️

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!👩‍💻👩‍💻

🧑‍🎨 Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
🦄 VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
🧩 More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
🧙‍♂️ GSPO paper: Group Sequence Policy Optimization (2507.18071)

sergiopaniego

posted an update 26 days ago

Post

333

Did you miss this? 👓

🧙‍♂️vLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!

sergiopaniego

posted an update 28 days ago

Post

2625

We just released TRL v0.20 with major multimodal upgrades!

👁️ VLM support for GRPO (highly requested by the community!)
🎞️ New GSPO trainer (from @Qwen , released last week, VLM-ready)
🐙 New MPO trainer (multimodal by design, as in the paper)

📝 Full release notes here: https://github.com/huggingface/trl/releases/tag/v0.20.0

sergiopaniego

posted an update about 1 month ago

Post

1196

Yet Another New Multimodal Fine-Tuning Recipe 🥧

🧑‍🍳 In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.

💡 This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!

We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).

Check it out! ➡️ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo

2 replies

·

sergiopaniego

posted an update about 1 month ago

Post

1675

🧑‍🍳 New Multimodal Fine-Tuning Recipe 🧑‍🍳

⚡️ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.

🔍 Object detection typically involves detecting categories in images (e.g., vase).

By combining it with visual grounding, we add contextual understanding so instead of detecting just "vase", we can detect "middle vase" in an image.

VLMs are super powerful!

In this case, I use PaliGemma 2 which already supports object detection and extend it to also add visual grounding.

🤗 Check it out here: https://huggingface.co/learn/cookbook/fine_tuning_vlm_object_detection_grounding

sergiopaniego

posted an update about 1 month ago

Post

1636

Multiple NEW notebooks and scripts added to the Hugging Face Gemma recipes repo!

Thanks to the community 🫶, we're adding more and more recipes using Gemma 💎

Fine tuning for all modalities, function calling, RAG...

Repo: https://github.com/huggingface/huggingface-gemma-recipes

We're also open to new ideas from the community 🤗!

1 reply

·

burtenshaw

posted an update about 1 month ago

Post

1195

Kimi-K2 is ready for general use! In these notebooks I walk you through use cases like function calling and structured outputs.

🔗 burtenshaw/Kimi-K2-notebooks

You can swap it into any OpenAI compatible application via Inference Providers and get to work with an open source model.

1 reply

·

sergiopaniego

posted an update about 1 month ago

Post

409

Loved this paper! ♥️

Authors benchmark multimodal models on vision tasks (detection, segmentation...) using clever prompting tricks.

📄 Results: VLMs are solid generalists but still lag behind SOTA task-specific models — especially on geometric tasks vs. semantic ones.

paper: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks (2507.01955)

sergiopaniego

posted an update about 1 month ago

Post

277

You can already play with two of the latest most impressive models on HF via @novita-ai as Inference Provider 🚨

🌌 Kimi K2: 1T params model, MoE beast for coding, reasoning and agentic tasks
🔮 GLM-4.1V-9B-Thinking: VLM + deep reasoning model

Kimi K2: moonshotai/Kimi-K2-Instruct
GLM-4.1V-9B-Thinking: https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking

sergiopaniego

posted an update about 1 month ago

Post

238

Over 1K already on @huggingface !!

Hugging Face Agents Course

AI & ML interests

Recent Activity

agents-course/certificates

agents-course/final-certificates

agents-course/course-certificates-of-excellence

agents-course/unit4-students-scores

AI & ML interests

Recent Activity

Team members 9

agents-course's activity