TheBloke commited on
Commit
6972878
·
1 Parent(s): 65a1231

First commit of GPTQ model

Browse files
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - gozfarb/ShareGPT_Vicuna_unfiltered
4
+ ---
5
+
6
+ # VicUnlocked-30B-LoRA GPTQ
7
+
8
+ This is GPTQ format quantised 4bit models of [Neko Institute of Science's VicUnLocked 30B LoRA](https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA).
9
+
10
+ The files in this repo are the result of merging the above LoRA with the original LLaMA 30B, then quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
11
+
12
+ ## Repositories available
13
+
14
+ * [4-bit, 5-bit and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML).
15
+ * [4bit's GPTQ 4-bit model for GPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ).
16
+ * [float16 HF format model for GPU inference and further conversions](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-HF).
17
+
18
+ ## How to easily download and use this model in text-generation-webui
19
+
20
+ Open the text-generation-webui UI as normal.
21
+
22
+ 1. Click the **Model tab**.
23
+ 2. Under **Download custom model or LoRA**, enter `TheBloke/VicUnlocked-30B-LoRA-GPTQ`.
24
+ 3. Click **Download**.
25
+ 4. Wait until it says it's finished downloading.
26
+ 5. Click the **Refresh** icon next to **Model** in the top left.
27
+ 6. In the **Model drop-down**: choose the model you just downloaded, `VicUnlocked-30B-LoRA-GPTQ`.
28
+ 7. If you see an error in the bottom right, ignore it - it's temporary.
29
+ 8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = None`, `model_type = Llama`
30
+ 9. Click **Save settings for this model** in the top right.
31
+ 10. Click **Reload the Model** in the top right.
32
+ 11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
33
+
34
+ ## Provided files
35
+
36
+ **Compatible file - VicUnlocked-30B-LoRA-GPTQ-4bit.act-order.safetensors**
37
+
38
+ In the `main` branch - the default one - you will find `VicUnlocked-30B-LoRA-GPTQ-4bit-128g.compat.no-act-order.safetensors`
39
+
40
+ This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
41
+
42
+ It was created without groupsize so as to minimise VRAM requirements. It is created with the `--act-order` parameter to improve inference quality.
43
+
44
+ * `VicUnlocked-30B-LoRA-GPTQ-4bit-128g.compat.no-act-order.safetensors`
45
+ * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
46
+ * Works with AutoGPTQ.
47
+ * Works with text-generation-webui one-click-installers
48
+ * Parameters: Groupsize = None. act-order.
49
+ * Command used to create the GPTQ:
50
+ ```
51
+ llama.py /workspace/vicunlocked-30b/HF wikitext2 --wbits 4 --true-sequential --act-order --save_safetensors /workspace/vicunlocked-30b/gptq/VicUnlocked-30B-GPTQ-4bit.act-order.safetensors
52
+ ```
53
+
54
+
55
+ # Original model card
56
+
57
+ # Convert tools
58
+ https://github.com/practicaldreamer/vicuna_to_alpaca
59
+
60
+ # Training tool
61
+ https://github.com/oobabooga/text-generation-webui
62
+
63
+ ATM I'm using 2023.05.04v0 of the dataset and training full context.
64
+
65
+ # Notes:
66
+ So I will only be training 1 epoch, as full context 30b takes so long to train.
67
+ This 1 epoch will take me 8 days lol but luckily these LoRA feels fully functinal at epoch 1 as shown on my 13b one.
68
+ Also I will be uploading checkpoints almost everyday. I could train another epoch if there's enough want for it.
69
+
70
+ Update: Since I will not be training over 1 epoch @Aeala is training for the full 3 https://huggingface.co/Aeala/VicUnlocked-alpaca-half-30b-LoRA but it's half ctx if you care about that. Also @Aeala's just about done.
71
+
72
+ Update: Training Finished at Epoch 1, These 8 days sure felt long. I only have one A6000 lads there's only so much I can do. Also RIP gozfarb IDK what happened to him.
73
+
74
+ # How to test?
75
+ 1. Download LLaMA-30B-HF if you have not: https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-HF
76
+ 2. Make a folder called VicUnLocked-30b-LoRA in the loras folder.
77
+ 3. Download adapter_config.json and adapter_model.bin into VicUnLocked-30b-LoRA.
78
+ 4. Load ooba: ```python server.py --listen --model LLaMA-30B-HF --load-in-8bit --chat --lora VicUnLocked-30b-LoRA```
79
+ 5. Select instruct and chose Vicuna-v1.1 template.
80
+
81
+
82
+ # Training Log
83
+ https://wandb.ai/neko-science/VicUnLocked/runs/vx8yzwi7
VicUnlocked-30B-GPTQ-4bit.act-order.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c55b158251901afd8671ff738e95913ea38094f84a0e9903d8851799b8ee9d2
3
+ size 16940128404
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/workspace/models/LLaMA-30B-HF",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 6656,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 17920,
12
+ "max_position_embeddings": 2048,
13
+ "model_type": "llama",
14
+ "num_attention_heads": 52,
15
+ "num_hidden_layers": 60,
16
+ "pad_token_id": 0,
17
+ "rms_norm_eps": 1e-06,
18
+ "tie_word_embeddings": false,
19
+ "torch_dtype": "float16",
20
+ "transformers_version": "4.29.2",
21
+ "use_cache": true,
22
+ "vocab_size": 32000
23
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.29.2"
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<s>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "model_max_length": 1000000000000000019884624838656,
22
+ "pad_token": null,
23
+ "sp_model_kwargs": {},
24
+ "tokenizer_class": "LlamaTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }