TheBloke
/

stable-vicuna-13B-GPTQ

@@ -38,28 +38,18 @@ Load text-generation-webui as you normally do.
 8. Click **Reload the Model** in the top right.
 9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
-Note that the automatic download will currently download two model files, and you only need one.
-Feel free to delete `stable-vicuna-13B-GPTQ-4bit.latest.no-act-order.safetensors` after it's downloaded (unless you know you can use latest GPTQ-for-LLaMa code as described below, in which case delete the 'compat' file instead.)
-I will soon improve this repo so only one file is downloaded.
-## GIBBERISH OUTPUT IN `text-generation-webui`?
-If you're installing the model files manually, please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
-If you're using a text-generation-webui one click installer, you MUST use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
-## Provided files
-Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
-If you do an automatic download with `text-generation-webui` as described above it will pick the 'compat' file which should work for everyone.
-The 'latest' file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
-Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
 * `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
   * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
   * Works with text-generation-webui one-click-installers
@@ -69,6 +59,13 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vi
     ```
     CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
     ```
 * `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
   * Only works with recent GPTQ-for-LLaMa code
   * **Does not** work with text-generation-webui one-click-installers
@@ -78,7 +75,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vi
     ```
     CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
     ```
 ## Manual instructions for `text-generation-webui`
 File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).

 8. Click **Reload the Model** in the top right.
 9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
+## Provided files
+I have uploaded two versions of the GPTQ.
+**Compatible file - stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors**
+In the `main` branch - the default one - you will find `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
+This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
+It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
 * `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
   * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
   * Works with text-generation-webui one-click-installers
     ```
     CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
     ```
+**Latest file - stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors**
+Created for more recent versions of GPTQ-for-LLaMa, and uses the `--act-order` flag for maximum theoretical performance.
+To access this file, please switch to the `latest` branch fo this repo and download from there.
 * `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
   * Only works with recent GPTQ-for-LLaMa code
   * **Does not** work with text-generation-webui one-click-installers
     ```
     CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
     ```
 ## Manual instructions for `text-generation-webui`
 File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).