long Context length
#1
by
SuperSonnix71
- opened
40,960 is pretty big context length.... unsloth with Transformers 4.53.0 and high RoPE theta (1,000,000) and Qwen3RMSNorm throughout including for attention projections—q_norm and k_norm.. Awesome!
Just like Qwen-3, you can easily adjust its context to 128k (131,072) tokens using Yarn.