@danielhanchen on Hugging Face: "You can now do reinforcement learning training with 7× longer context and no…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

danielhanchen

posted an update about 1 month ago

Post

2847

You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.

Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.

Blog: https://unsloth.ai/docs/new/grpo-long-context

In this post

danielhanchen Daniel (Unsloth)