Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
danielhanchen 
posted an update about 1 month ago
Post
2847
You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.

Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.

Blog: https://unsloth.ai/docs/new/grpo-long-context