Spaces:

vishaljoshi24
/

trl-4-dnd

Paused

trl-4-dnd / docs /source /paper_index.md

Initial Commit

a080fe0 3 days ago

912 Bytes

	# Paper Index

	<Tip warning={true}>

	Section under construction. Feel free to contribute!

	</Tip>

	## Group Sequence Policy Optimization

	📜 Paper: https://huggingface.co/papers/2507.18071

	GSPO is a GRPO variant that computes importance sampling weights at the sequence level instead of per-token. To reproduce the paper's setting, use this configuration:

	```python
	from trl import GRPOConfig

	training_args = GRPOConfig(
	importance_sampling_level="sequence",
	loss_type="grpo",
	beta=0.0, # GSPO set kl regularization to zero: https://github.com/volcengine/verl/pull/2775#issuecomment-3131807306
	epsilon=3e-4, # GSPO paper (v2), section 5.1
	epsilon_high=4e-4, # GSPO paper (v2), section 5.1
	gradient_accumulation_steps=1,
	steps_per_generation=4, # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1. Must be 4 times gradient_accumulation_steps
	)
	```