@burtenshaw on Hugging Face: "Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst…"

burtenshaw

posted an update Mar 12

Post

2164

Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:

In this notebooks I combine together google’s model with some community tooling

- First, I load the model from the Hugging Face hub with transformers’s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRL’s GRPOTrainer to train the model

Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.

https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing

AtAndDev

Mar 12

Bruh its been 8 hours since announcement. chill ya guys

Akhil-Theerthala

Mar 12

Thanks. I was needing it.

ZennyKenny

Mar 13

Crazy that this is a day 0 release.

chenk-ai

Mar 19

•

edited Mar 19

I experienced that the GRPO from TRL is very memory-consuming. There are already various alternative implementations out there that seem much faster and more lightweight. Unsloth is promoting this with a factor of 10 less memory! This is insane. Can we potentially expect something similar for the TRL implementation in the near future?

I have combined the RL gym lib with GRPO here to see if you can teach a small model to drive taxi. This already took around 70gb for the 1.5b model.

BTW: The RL gym lib could be potentially helpful for new/better reasoning models (and new benchmarks)?

https://github.com/chenkel-data/grpo-taxi

Join the conversation