Update README.md
Browse files
README.md
CHANGED
@@ -24,6 +24,10 @@ TWO Example generations (Q4KS, CPU) at the bottom of this page using 16 experts
|
|
24 |
This reduces the speed of the model, but uses more "experts" to process your prompts and uses 6B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to
|
25 |
use the regular model ("30B-A3B"), and use this model for MORE COMPLEX / "DEEPER" (IE Nuance) use case(s).
|
26 |
|
|
|
|
|
|
|
|
|
27 |
Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts) or "High-Speed" (4 experts) version(s).
|
28 |
|
29 |
Using 16 experts instead of the default 8 will slow down token/second speeds about about 1/2 or so.
|
|
|
24 |
This reduces the speed of the model, but uses more "experts" to process your prompts and uses 6B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to
|
25 |
use the regular model ("30B-A3B"), and use this model for MORE COMPLEX / "DEEPER" (IE Nuance) use case(s).
|
26 |
|
27 |
+
For the 128k version, Neo Imatrix, with additional modified quants (IQ1M to Q8 Ultra (f16 experts)) - 52 quants - see:
|
28 |
+
|
29 |
+
[ https://huggingface.co/DavidAU/Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf ]
|
30 |
+
|
31 |
Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts) or "High-Speed" (4 experts) version(s).
|
32 |
|
33 |
Using 16 experts instead of the default 8 will slow down token/second speeds about about 1/2 or so.
|