Dude's going nuts with Qwen3
Are we getting 30B-A30B tomorrow at this rate? hahaha!
By the way I had really good results with your neo quants for qwen3. Tried 4b up to 32b and so far they're super accurate! I'm surprised, even the 4b model can one-shot a three.js spinning donut. Good job on preserving model quality!
30 / 30 ... MAYBE !!! lol
Thanks ;
Good the hear about the NEO quants ; there is a wide margin (of difference) between the NEO and HORROR imatrix quants.
Unclear why ; except Imatrix is have a stronger effect than expected.
There is a lot of testing / tweaking and such to do yet. Little overwhelming the... options.
Team Qwen really knocked it out of the part - on all models.
Hat off to them.
PS: 256K context maybe next?
gpt 4.6
I always wanted more active experts, and I wish it were much easier to do. So, thanks!
I would kind of love to have a 30/30 at 32k, 128k and 256k just for shits and giggles lol. I noticed for certain tasks like coding ROPE seems to affect model performance negatively more than I thought; so I'd say 32k for coherence, 128k for when we really need it and 256k just because you can hahahaha!
Hear you on all counts ; maybe 512k?
Qwen 8B tests at 256k -; it holds up ... some loss of function (but not as much as 4B @256K).
With a MOE... more experts to "clean up the mess" made by context ah... experiments... (?)
@ThijsL202
100% ;
That being said, too many cooks can spoil the "meal" ... but I want to cook, the way I want to cook.
As an aside: If this model was fine tuned with all / most of the experts on a single / branching domain (IE all medical, general + speciality)
then more experts/experts control would be a positive .