Update README.md
Browse files
README.md
CHANGED
|
@@ -38,28 +38,18 @@ Load text-generation-webui as you normally do.
|
|
| 38 |
8. Click **Reload the Model** in the top right.
|
| 39 |
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
Feel free to delete `stable-vicuna-13B-GPTQ-4bit.latest.no-act-order.safetensors` after it's downloaded (unless you know you can use latest GPTQ-for-LLaMa code as described below, in which case delete the 'compat' file instead.)
|
| 44 |
-
|
| 45 |
-
I will soon improve this repo so only one file is downloaded.
|
| 46 |
-
|
| 47 |
-
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
| 48 |
-
|
| 49 |
-
If you're installing the model files manually, please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
|
| 62 |
-
|
| 63 |
* `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
|
| 64 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
| 65 |
* Works with text-generation-webui one-click-installers
|
|
@@ -69,6 +59,13 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vi
|
|
| 69 |
```
|
| 70 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
|
| 71 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
* `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
|
| 73 |
* Only works with recent GPTQ-for-LLaMa code
|
| 74 |
* **Does not** work with text-generation-webui one-click-installers
|
|
@@ -78,7 +75,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vi
|
|
| 78 |
```
|
| 79 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
|
| 80 |
```
|
| 81 |
-
|
| 82 |
## Manual instructions for `text-generation-webui`
|
| 83 |
|
| 84 |
File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
|
|
|
| 38 |
8. Click **Reload the Model** in the top right.
|
| 39 |
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
| 40 |
|
| 41 |
+
## Provided files
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
I have uploaded two versions of the GPTQ.
|
| 44 |
|
| 45 |
+
**Compatible file - stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors**
|
| 46 |
|
| 47 |
+
In the `main` branch - the default one - you will find `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
|
| 48 |
|
| 49 |
+
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
|
| 50 |
|
| 51 |
+
It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
|
| 52 |
|
|
|
|
|
|
|
| 53 |
* `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
|
| 54 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
| 55 |
* Works with text-generation-webui one-click-installers
|
|
|
|
| 59 |
```
|
| 60 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
|
| 61 |
```
|
| 62 |
+
|
| 63 |
+
**Latest file - stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors**
|
| 64 |
+
|
| 65 |
+
Created for more recent versions of GPTQ-for-LLaMa, and uses the `--act-order` flag for maximum theoretical performance.
|
| 66 |
+
|
| 67 |
+
To access this file, please switch to the `latest` branch fo this repo and download from there.
|
| 68 |
+
|
| 69 |
* `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
|
| 70 |
* Only works with recent GPTQ-for-LLaMa code
|
| 71 |
* **Does not** work with text-generation-webui one-click-installers
|
|
|
|
| 75 |
```
|
| 76 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
|
| 77 |
```
|
| 78 |
+
|
| 79 |
## Manual instructions for `text-generation-webui`
|
| 80 |
|
| 81 |
File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|