Thireus
/

DeepSeek-V3-0324-THIREUS-Q8_0-SPECIAL_SPLIT

GGUF

imatrix

conversational

Model card Files Files and versions

xet

Community

Thireus commited on Aug 8

Commit

edc23d6

1 Parent(s): 5f38f56

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ This repository provides **GGUF-quantized tensors** for the DeepSeek-V3-0324 mod
 - 🛠️ Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
 - 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
-*tl;dr:*
 <details>
 ```
@@ -40,6 +40,8 @@ cp -f models/DeepSeek-R1-0528/download.conf . # Use the download.conf of the cho
 mkdir -p kitchen && cd kitchen
 ../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
 # Launch ik_llama's llama-cli:
 ulimit -n 99999 # Lifts "too many open files" limitation on Linux
 ~/ik_llama.cpp/build/bin/llama-cli \
@@ -60,7 +62,7 @@ ulimit -n 99999 # Lifts "too many open files" limitation on Linux
 1. **Compatibility & Speed** – [unsloth](https://huggingface.co/unsloth)’s dynamic quants may not always work optimally with `ik_llama.cpp`.
 2. **Custom Rig Fit** – No off-the-shelf GGUF model perfectly matched my VRAM/RAM setup, so I built a way to tailor models and leverage extra VRAM/RAM to reduce perplexity.
-3. **Automated PPL-Optimal Quantization** – To my knowledge, there was no flexible, automated method to minimize perplexity for any bits-per-weight (bpw) target—so I created one with excellent results!
 ---
@@ -85,7 +87,7 @@ Check out the [GGUF Tool Suite README](https://github.com/Thireus/GGUF-Tool-Suit
 2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
    - Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
 3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
-4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your rig for optimal perplexity.
 ---
@@ -95,11 +97,11 @@ Supported models are listed under `models/` in the [Tool Suite Github repo](http
 ---
-## 🤷‍♂️ Will I release pre-cooked GGUF files?
-No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them.
-Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard. Recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh`.
 Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
@@ -116,7 +118,7 @@ Users who don’t trust the GGUF shards on HuggingFace can also quantize their o
 ## 💡 Pro Tips
-You can download the BF16 model version to quantize your own shards:
 ```
 mkdir kitchen

 - 🛠️ Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
 - 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
+*tl;dr: Expand the details section below*
 <details>
 ```
 mkdir -p kitchen && cd kitchen
 ../quant_downloader.sh ../recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe
+# Other recipe examples can be found at https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
 # Launch ik_llama's llama-cli:
 ulimit -n 99999 # Lifts "too many open files" limitation on Linux
 ~/ik_llama.cpp/build/bin/llama-cli \
 1. **Compatibility & Speed** – [unsloth](https://huggingface.co/unsloth)’s dynamic quants may not always work optimally with `ik_llama.cpp`.
 2. **Custom Rig Fit** – No off-the-shelf GGUF model perfectly matched my VRAM/RAM setup, so I built a way to tailor models and leverage extra VRAM/RAM to reduce perplexity.
+3. **Automated PPL-Optimal Quantization** – To my knowledge, there was no open source flexible, automated method to minimize perplexity for any bits-per-weight (bpw) target—so I created one with excellent results!
 ---
 2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
    - Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
 3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
+4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your VRAM/RAM target usage for optimum perplexity.
 ---
 ---
+## 🤷‍♂️ Will I release baked dynamic quant GGUFs?
+No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them, or rely on generic GGUF dynamic quants such as [unsloth](https://huggingface.co/unsloth)'s.
+Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard. Note that recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh`.
 Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
 ## 💡 Pro Tips
+You can easily download the BF16 model version to quantize your own shards:
 ```
 mkdir kitchen