Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sergiopaniegoΒ 
posted an update about 18 hours ago
Post
1604
Just included example scripts for aligning models using GSPO (including VLM example) πŸ™†β€β™‚οΈπŸ™†β€β™‚οΈ

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!πŸ‘©β€πŸ’»πŸ‘©β€πŸ’»

πŸ§‘β€πŸŽ¨ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
πŸ¦„ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
🧩 More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
πŸ§™β€β™‚οΈ GSPO paper: Group Sequence Policy Optimization (2507.18071)
In this post