Nick Doiron
commited on
Commit
•
c57ec09
1
Parent(s):
53eff1c
quantize-then-dequantize
Browse files- .gitignore +1 -0
- README.md +4 -14
- pytorch_model-00001-of-00002.bin +2 -2
- pytorch_model-00002-of-00002.bin +2 -2
.gitignore
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
.DS_Store
|
README.md
CHANGED
@@ -22,7 +22,7 @@ Essentials:
|
|
22 |
- Based on LLaMa2-7b-hf (version 2, 7B params)
|
23 |
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
|
24 |
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
|
25 |
-
- Merged LLaMa2 and the adapter weights
|
26 |
|
27 |
## Prompt options
|
28 |
|
@@ -100,19 +100,9 @@ python3 qlora.py \
|
|
100 |
|
101 |
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
|
102 |
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
m = AutoModelForCausalLM.from_pretrained(
|
107 |
-
model_name,
|
108 |
-
#load_in_4bit=True,
|
109 |
-
torch_dtype=torch.bfloat16,
|
110 |
-
#device_map={"": 0},
|
111 |
-
)
|
112 |
-
m = PeftModel.from_pretrained(m, adapters_name)
|
113 |
-
m = m.merge_and_unload()
|
114 |
-
m.save_pretrained("nyc-savvy")
|
115 |
-
```
|
116 |
|
117 |
## Testing that the model is NYC-savvy
|
118 |
|
|
|
22 |
- Based on LLaMa2-7b-hf (version 2, 7B params)
|
23 |
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
|
24 |
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
|
25 |
+
- Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model
|
26 |
|
27 |
## Prompt options
|
28 |
|
|
|
100 |
|
101 |
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.
|
102 |
|
103 |
+
Two options for merging:
|
104 |
+
- The included `peftmerger.py` script merges the adapter and saves the model.
|
105 |
+
- Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
## Testing that the model is NYC-savvy
|
108 |
|
pytorch_model-00001-of-00002.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6875060db94711a55e3aefe325355c28b260fb3bd5795add8707cfe8fe8340b8
|
3 |
+
size 9976623130
|
pytorch_model-00002-of-00002.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:35f9ab7de991127d8aee80f8f6fea00e73385f303121ac995c1afd51fd2551ba
|
3 |
+
size 3500311811
|