Thireus commited on
Commit
edc23d6
·
1 Parent(s): 5f38f56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -12,7 +12,7 @@ This repository provides **GGUF-quantized tensors** for the DeepSeek-V3-0324 mod
12
  - 🛠️ Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
13
  - 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
14
 
15
- *tl;dr:*
16
  <details>
17
 
18
  ```
@@ -40,6 +40,8 @@ cp -f models/DeepSeek-R1-0528/download.conf . # Use the download.conf of the cho
40
  mkdir -p kitchen && cd kitchen
41
  ../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
42
 
 
 
43
  # Launch ik_llama's llama-cli:
44
  ulimit -n 99999 # Lifts "too many open files" limitation on Linux
45
  ~/ik_llama.cpp/build/bin/llama-cli \
@@ -60,7 +62,7 @@ ulimit -n 99999 # Lifts "too many open files" limitation on Linux
60
 
61
  1. **Compatibility & Speed** – [unsloth](https://huggingface.co/unsloth)’s dynamic quants may not always work optimally with `ik_llama.cpp`.
62
  2. **Custom Rig Fit** – No off-the-shelf GGUF model perfectly matched my VRAM/RAM setup, so I built a way to tailor models and leverage extra VRAM/RAM to reduce perplexity.
63
- 3. **Automated PPL-Optimal Quantization** – To my knowledge, there was no flexible, automated method to minimize perplexity for any bits-per-weight (bpw) target—so I created one with excellent results!
64
 
65
  ---
66
 
@@ -85,7 +87,7 @@ Check out the [GGUF Tool Suite README](https://github.com/Thireus/GGUF-Tool-Suit
85
  2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
86
  - Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
87
  3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
88
- 4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your rig for optimal perplexity.
89
 
90
  ---
91
 
@@ -95,11 +97,11 @@ Supported models are listed under `models/` in the [Tool Suite Github repo](http
95
 
96
  ---
97
 
98
- ## 🤷‍♂️ Will I release pre-cooked GGUF files?
99
 
100
- No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them.
101
 
102
- Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard. Recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh`.
103
 
104
  Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
105
 
@@ -116,7 +118,7 @@ Users who don’t trust the GGUF shards on HuggingFace can also quantize their o
116
 
117
  ## 💡 Pro Tips
118
 
119
- You can download the BF16 model version to quantize your own shards:
120
 
121
  ```
122
  mkdir kitchen
 
12
  - 🛠️ Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
13
  - 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
14
 
15
+ *tl;dr: Expand the details section below*
16
  <details>
17
 
18
  ```
 
40
  mkdir -p kitchen && cd kitchen
41
  ../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
42
 
43
+ # Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
44
+
45
  # Launch ik_llama's llama-cli:
46
  ulimit -n 99999 # Lifts "too many open files" limitation on Linux
47
  ~/ik_llama.cpp/build/bin/llama-cli \
 
62
 
63
  1. **Compatibility & Speed** – [unsloth](https://huggingface.co/unsloth)’s dynamic quants may not always work optimally with `ik_llama.cpp`.
64
  2. **Custom Rig Fit** – No off-the-shelf GGUF model perfectly matched my VRAM/RAM setup, so I built a way to tailor models and leverage extra VRAM/RAM to reduce perplexity.
65
+ 3. **Automated PPL-Optimal Quantization** – To my knowledge, there was no open source flexible, automated method to minimize perplexity for any bits-per-weight (bpw) target—so I created one with excellent results!
66
 
67
  ---
68
 
 
87
  2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
88
  - Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
89
  3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
90
+ 4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your VRAM/RAM target usage for optimum perplexity.
91
 
92
  ---
93
 
 
97
 
98
  ---
99
 
100
+ ## 🤷‍♂️ Will I release baked dynamic quant GGUFs?
101
 
102
+ No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them, or rely on generic GGUF dynamic quants such as [unsloth](https://huggingface.co/unsloth)'s.
103
 
104
+ Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard. Note that recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh`.
105
 
106
  Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
107
 
 
118
 
119
  ## 💡 Pro Tips
120
 
121
+ You can easily download the BF16 model version to quantize your own shards:
122
 
123
  ```
124
  mkdir kitchen