DavidAU
/

Qwen3-30B-A6B-16-Extreme

Text Generation

Model card Files Files and versions Community

DavidAU commited on May 15

Commit

1d38044

·

verified ·

1 Parent(s): afda031

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -24,6 +24,10 @@ TWO Example generations (Q4KS, CPU) at the bottom of this page using 16 experts
 This reduces the speed of the model, but uses more "experts" to process your prompts and uses 6B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to
 use the regular model ("30B-A3B"), and use this model for MORE COMPLEX / "DEEPER" (IE Nuance) use case(s).
 Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts) or "High-Speed" (4 experts) version(s).
 Using 16 experts instead of the default 8 will slow down token/second speeds about about 1/2 or so.

 This reduces the speed of the model, but uses more "experts" to process your prompts and uses 6B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to
 use the regular model ("30B-A3B"), and use this model for MORE COMPLEX / "DEEPER" (IE Nuance) use case(s).
+For the 128k version, Neo Imatrix, with additional modified quants (IQ1M to Q8 Ultra (f16 experts)) - 52 quants - see:
+[ https://huggingface.co/DavidAU/Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf ]
 Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts) or "High-Speed" (4 experts) version(s).
 Using 16 experts instead of the default 8 will slow down token/second speeds about about 1/2 or so.