Thireus
commited on
Commit
·
edc23d6
1
Parent(s):
5f38f56
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ This repository provides **GGUF-quantized tensors** for the DeepSeek-V3-0324 mod
|
|
12 |
- 🛠️ Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
|
13 |
- 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
|
14 |
|
15 |
-
*tl;dr
|
16 |
<details>
|
17 |
|
18 |
```
|
@@ -40,6 +40,8 @@ cp -f models/DeepSeek-R1-0528/download.conf . # Use the download.conf of the cho
|
|
40 |
mkdir -p kitchen && cd kitchen
|
41 |
../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
|
42 |
|
|
|
|
|
43 |
# Launch ik_llama's llama-cli:
|
44 |
ulimit -n 99999 # Lifts "too many open files" limitation on Linux
|
45 |
~/ik_llama.cpp/build/bin/llama-cli \
|
@@ -60,7 +62,7 @@ ulimit -n 99999 # Lifts "too many open files" limitation on Linux
|
|
60 |
|
61 |
1. **Compatibility & Speed** – [unsloth](https://huggingface.co/unsloth)’s dynamic quants may not always work optimally with `ik_llama.cpp`.
|
62 |
2. **Custom Rig Fit** – No off-the-shelf GGUF model perfectly matched my VRAM/RAM setup, so I built a way to tailor models and leverage extra VRAM/RAM to reduce perplexity.
|
63 |
-
3. **Automated PPL-Optimal Quantization** – To my knowledge, there was no flexible, automated method to minimize perplexity for any bits-per-weight (bpw) target—so I created one with excellent results!
|
64 |
|
65 |
---
|
66 |
|
@@ -85,7 +87,7 @@ Check out the [GGUF Tool Suite README](https://github.com/Thireus/GGUF-Tool-Suit
|
|
85 |
2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
|
86 |
- Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
|
87 |
3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
|
88 |
-
4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your
|
89 |
|
90 |
---
|
91 |
|
@@ -95,11 +97,11 @@ Supported models are listed under `models/` in the [Tool Suite Github repo](http
|
|
95 |
|
96 |
---
|
97 |
|
98 |
-
## 🤷♂️ Will I release
|
99 |
|
100 |
-
No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them.
|
101 |
|
102 |
-
Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard.
|
103 |
|
104 |
Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
|
105 |
|
@@ -116,7 +118,7 @@ Users who don’t trust the GGUF shards on HuggingFace can also quantize their o
|
|
116 |
|
117 |
## 💡 Pro Tips
|
118 |
|
119 |
-
You can download the BF16 model version to quantize your own shards:
|
120 |
|
121 |
```
|
122 |
mkdir kitchen
|
|
|
12 |
- 🛠️ Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
|
13 |
- 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
|
14 |
|
15 |
+
*tl;dr: Expand the details section below*
|
16 |
<details>
|
17 |
|
18 |
```
|
|
|
40 |
mkdir -p kitchen && cd kitchen
|
41 |
../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
|
42 |
|
43 |
+
# Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
|
44 |
+
|
45 |
# Launch ik_llama's llama-cli:
|
46 |
ulimit -n 99999 # Lifts "too many open files" limitation on Linux
|
47 |
~/ik_llama.cpp/build/bin/llama-cli \
|
|
|
62 |
|
63 |
1. **Compatibility & Speed** – [unsloth](https://huggingface.co/unsloth)’s dynamic quants may not always work optimally with `ik_llama.cpp`.
|
64 |
2. **Custom Rig Fit** – No off-the-shelf GGUF model perfectly matched my VRAM/RAM setup, so I built a way to tailor models and leverage extra VRAM/RAM to reduce perplexity.
|
65 |
+
3. **Automated PPL-Optimal Quantization** – To my knowledge, there was no open source flexible, automated method to minimize perplexity for any bits-per-weight (bpw) target—so I created one with excellent results!
|
66 |
|
67 |
---
|
68 |
|
|
|
87 |
2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
|
88 |
- Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
|
89 |
3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
|
90 |
+
4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your VRAM/RAM target usage for optimum perplexity.
|
91 |
|
92 |
---
|
93 |
|
|
|
97 |
|
98 |
---
|
99 |
|
100 |
+
## 🤷♂️ Will I release baked dynamic quant GGUFs?
|
101 |
|
102 |
+
No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them, or rely on generic GGUF dynamic quants such as [unsloth](https://huggingface.co/unsloth)'s.
|
103 |
|
104 |
+
Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard. Note that recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh`.
|
105 |
|
106 |
Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
|
107 |
|
|
|
118 |
|
119 |
## 💡 Pro Tips
|
120 |
|
121 |
+
You can easily download the BF16 model version to quantize your own shards:
|
122 |
|
123 |
```
|
124 |
mkdir kitchen
|