Thireus
commited on
Commit
·
fedd15b
1
Parent(s):
5618e46
Update README.md
Browse files
README.md
CHANGED
@@ -47,12 +47,12 @@ cd GGUF-Tool-Suite
|
|
47 |
rm -f download.conf # Make sure to copy the relevant download.conf for the model before running quant_assign.py
|
48 |
cp -f models/GLM-4.5/download.conf . # Use the download.conf of the chosen model
|
49 |
mkdir -p kitchen && cd kitchen
|
50 |
-
../quant_downloader.sh ../recipe_examples/GLM-4.5.ROOT-
|
51 |
|
52 |
# Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
|
53 |
|
54 |
# Launch ik_llama's llama-server:
|
55 |
-
ulimit -n
|
56 |
~/ik_llama.cpp/build/bin/llama-server \
|
57 |
-m GLM-4.5-THIREUS-BF16-SPECIAL_TENSOR-00001-of-01762.gguf \
|
58 |
-fa -fmoe -ctk f16 -c 4096 -ngl 99 \
|
@@ -86,6 +86,8 @@ Here’s how GLM-4.5 quantized with **Thireus’ GGUF Tool Suite** stacks up aga
|
|
86 |
|
87 |
More perplexity/bpw graphs for other supported models: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/ppl_graphs
|
88 |
|
|
|
|
|
89 |
---
|
90 |
|
91 |
## 🚀 How do I get started?
|
|
|
47 |
rm -f download.conf # Make sure to copy the relevant download.conf for the model before running quant_assign.py
|
48 |
cp -f models/GLM-4.5/download.conf . # Use the download.conf of the chosen model
|
49 |
mkdir -p kitchen && cd kitchen
|
50 |
+
../quant_downloader.sh ../recipe_examples/ik_harmonized_recipes/GLM-4.5.ROOT-4.1636bpw-3.2647ppl.173GB-GGUF_12GB-GPU_160GB-CPU.90e3c2f_1ac651c.recipe
|
51 |
|
52 |
# Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
|
53 |
|
54 |
# Launch ik_llama's llama-server:
|
55 |
+
ulimit -n 9999 # Lifts "too many open files" limitation on Linux
|
56 |
~/ik_llama.cpp/build/bin/llama-server \
|
57 |
-m GLM-4.5-THIREUS-BF16-SPECIAL_TENSOR-00001-of-01762.gguf \
|
58 |
-fa -fmoe -ctk f16 -c 4096 -ngl 99 \
|
|
|
86 |
|
87 |
More perplexity/bpw graphs for other supported models: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/ppl_graphs
|
88 |
|
89 |
+
*All PPL values are computed with the parameters `-ctk f16 -c 512 -b 4096 -ub 4096`. Changing any of these parameters will alter the PPL. In particular, reducing `-b 4096 -ub 4096` increases the PPL, while increasing them decreases the PPL.*
|
90 |
+
|
91 |
---
|
92 |
|
93 |
## 🚀 How do I get started?
|