Dominic Nyambane
coderfpv
Β·
AI & ML interests
Reinforcement Learning, Robotics
Recent Activity
reacted
to
burtenshaw's
post
with π€
4 days ago
NEW UNIT in the Hugging Face Reasoning course. We dive deep into the algorithm behind DeepSeek R1 with an advanced and hands-on guide to interpreting GRPO.
π https://huggingface.co/reasoning-course
This unit is super useful if youβre tuning models with reinforcement learning. It will help with:
- interpreting loss and reward progression during training runs
- selecting effective parameters for training
- reviewing and defining effective reward functions
This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth.
π£ Shout out to @ShirinYamani who wrote the unit. Follow for more great content.
liked
a model
13 days ago
canopylabs/orpheus-3b-0.1-ft
liked
a model
15 days ago
google/gemma-3-27b-it
Organizations
Collections
6
models
None public yet
datasets
None public yet