AI & ML interests

None defined yet.

Recent Activity

JofthomasΒ  updated a dataset about 9 hours ago
agents-course/unit4-students-scores
sergiopaniegoΒ  updated a dataset about 9 hours ago
agents-course/final-certificates
View all activity

sergiopaniegoΒ 
posted an update about 1 hour ago
sergiopaniegoΒ 
posted an update 5 days ago
sergiopaniegoΒ 
posted an update 9 days ago
view post
Post
395
if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs

we have a broad list to cover!! 🧐

https://github.com/huggingface/trl/issues/4407
sergiopaniegoΒ 
posted an update 20 days ago
view post
Post
466
Meet the Post-Training Toolkit (PTT), which easily integrates with TRL via a single callback, by Aditya Challapally ( @microsoft ):

πŸ” Detects training issues early
πŸ›  Lets you intervene safely
πŸ“Š Keeps long training runs stable, auditable & efficient

Microsoft blog: https://devblogs.microsoft.com/engineering-at-microsoft/diagnosing-instability-in-production-scale-agent-rl/

Integration guide: https://huggingface.co/docs/trl/main/en/ptt_integration

Code: https://github.com/microsoft/post-training-toolkit
sergiopaniegoΒ 
posted an update 21 days ago
sergiopaniegoΒ 
posted an update 23 days ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
1622
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
835
TRL v0.27.0 is out!! πŸ₯³

It includes GDPO, the latest variant of GRPO for multi-reward RL ✨
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence β€” developed by
@sliuau @SimonX et al.

Explore the paper: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (2601.05242)

Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
3026
New REPL environment in OpenEnv available! ✨
Used in the Recursive Language Models (RLM) paper by Alex Zhang.

Ready for inference & post-training using trajectories. Handles long contexts:

> Run Python code in a sandbox
> Make recursive calls to LMs
> Explore data programmatically
> Return final result

Docs: https://meta-pytorch.org/OpenEnv/environments/repl/
Inference script: https://github.com/meta-pytorch/OpenEnv/blob/main/examples/repl_oolong_simple.py
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
550
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!

⚠️ LLMs struggle with long prompts β†’ attention overload & lost info
πŸ”„ RLMs inspect, split & call themselves on chunks, then aggregate results
βœ… Handles millions of tokens, reduces noise, improves reasoning
πŸ’‘ System prompt guides recursion
🎯 RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)

We're adding it to OpenEnv (with Kashif Rasul): https://github.com/meta-pytorch/OpenEnv/pull/282

More resources:

> Paper: Recursive Language Models (2512.24601)
> Paper blog: https://alexzhang13.github.io/blog/2025/rlm/
> RLM repo: https://github.com/alexzhang13/rlm
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update about 1 month ago
pcuenqΒ 
posted an update about 1 month ago
view post
Post
3399
πŸ‘‰ What happened in AI in 2025? πŸ‘ˆ

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 β€” Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 β€” Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 β€” "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 β€” Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🀯

Credits
πŸ™ NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫑 @reach-vb for the original idea, design and recipe

πŸ™Œ @ariG23498 and yours truly for compiling and verifying the 2025 edition

πŸ₯³ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! πŸ₯‚
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
2613
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

β€’ SFT
β€’ GRPO
β€’ Tool calling & agents
β€’ RL environments with OpenEnv
β€’ LLMs and VLMs
✨ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks
sergiopaniegoΒ 
posted an update about 2 months ago
sergiopaniegoΒ 
posted an update about 2 months ago