Update README.md
Browse files
README.md
CHANGED
@@ -26,9 +26,15 @@ Quantization was done using https://github.com/oobabooga/GPTQ-for-LLaMa for use
|
|
26 |
|
27 |
Via the following command:
|
28 |
```
|
29 |
-
python llama.py ./TehVenom_Pygmalion-7b-Merged-Safetensors c4 --wbits 4 --
|
30 |
```
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## Prompting
|
33 |
|
34 |
The model was trained on the usual Pygmalion persona + chat format, so any of the usual UIs should already handle everything correctly. If you're using the model directly, this is the expected formatting:
|
|
|
26 |
|
27 |
Via the following command:
|
28 |
```
|
29 |
+
python llama.py ./TehVenom_Pygmalion-7b-Merged-Safetensors c4 --wbits 4 --act-order --save_safetensors Pygmalion-7B-GPTQ-4bit.act-order.safetensors
|
30 |
```
|
31 |
|
32 |
+
This is the best eval i could get after trying many argument combinations, by converting the model from bf16 to fp32, before quantizing down to 4bit with --act-order as the sole argument.
|
33 |
+
|
34 |
+
- Wikitext 2: 6.2477378845215
|
35 |
+
- PTB-New: 46.5129699707031
|
36 |
+
- C4-New: 7.8470954895020
|
37 |
+
|
38 |
## Prompting
|
39 |
|
40 |
The model was trained on the usual Pygmalion persona + chat format, so any of the usual UIs should already handle everything correctly. If you're using the model directly, this is the expected formatting:
|