Post
1604
Just included example scripts for aligning models using GSPO (including VLM example) πββοΈπββοΈ
GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.
Super-easy-to-get-started example scripts below, GO run them!π©βπ»π©βπ»
π§βπ¨ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
π¦ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
π§© More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
π§ββοΈ GSPO paper: Group Sequence Policy Optimization (2507.18071)
GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.
Super-easy-to-get-started example scripts below, GO run them!π©βπ»π©βπ»
π§βπ¨ Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
π¦ VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
π§© More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
π§ββοΈ GSPO paper: Group Sequence Policy Optimization (2507.18071)