unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF · Any reason to not use this model over the 256K context model?

4 days ago

If I am correct, the only difference between the 1 million context length upload and the regular upload is that the 1 million context upload has a yarn value set to 4.

This allows it to handle 1 million tokens when it was not trained to do so. My question is, is there any reason not to use this model and just ignore the 256K context, even if I am not going to be using the full 1 million context?

If both models have the same performance but one can handle one million, I might as well just download that one.

CHNtentes

4 days ago

https://huggingface.co/Qwen/Qwen2.5-7B-Instruct#processing-long-texts
It says static YARN could affect shorter inputs performance. Not sure whether it applies to this case though.

mallorbc

3 days ago

@CHNtentes it likely does. Makes sense to me. Thanks!

shimmyshimmer

Unsloth AI org 2 days ago

The biggest reason why 1M differs from our normal GGUFs, is because we utilize very long examples in our calibration dataset

In general if you're not using the context length over 256k, would recommend you to use the standard one.

CC: @CHNtentes @mallorbc @owao @auggie246 @rboehme86 @ijohn07