DavidAU commited on
Commit
1d38044
·
verified ·
1 Parent(s): afda031

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -24,6 +24,10 @@ TWO Example generations (Q4KS, CPU) at the bottom of this page using 16 experts
24
  This reduces the speed of the model, but uses more "experts" to process your prompts and uses 6B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to
25
  use the regular model ("30B-A3B"), and use this model for MORE COMPLEX / "DEEPER" (IE Nuance) use case(s).
26
 
 
 
 
 
27
  Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts) or "High-Speed" (4 experts) version(s).
28
 
29
  Using 16 experts instead of the default 8 will slow down token/second speeds about about 1/2 or so.
 
24
  This reduces the speed of the model, but uses more "experts" to process your prompts and uses 6B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to
25
  use the regular model ("30B-A3B"), and use this model for MORE COMPLEX / "DEEPER" (IE Nuance) use case(s).
26
 
27
+ For the 128k version, Neo Imatrix, with additional modified quants (IQ1M to Q8 Ultra (f16 experts)) - 52 quants - see:
28
+
29
+ [ https://huggingface.co/DavidAU/Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf ]
30
+
31
  Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts) or "High-Speed" (4 experts) version(s).
32
 
33
  Using 16 experts instead of the default 8 will slow down token/second speeds about about 1/2 or so.