|
--- |
|
license: mit |
|
datasets: |
|
- monsoon-nlp/asknyc-chatassistant-format |
|
language: |
|
- en |
|
tags: |
|
- reddit |
|
- asknyc |
|
- nyc |
|
- llama2 |
|
widget: |
|
- text: "### Human: where can I find a good BEC?### Assistant: " |
|
example_title: "Basic prompt" |
|
- text: "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n### Human: What museums should we visit? - My kids are aged 12 and 5. They love fish### Assistant: " |
|
example_title: "Assistant prompt" |
|
--- |
|
|
|
# nyc-savvy-llama2-7b |
|
|
|
Essentials: |
|
- Based on LLaMa2-7b-hf (version 2, 7B params) |
|
- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges |
|
- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b) |
|
- Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model |
|
|
|
## Prompt options |
|
|
|
Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space). |
|
|
|
`### Human: Post title - post content### Assistant: ` |
|
|
|
For example: |
|
|
|
`### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.` |
|
|
|
From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format: |
|
|
|
``` |
|
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. |
|
``` |
|
|
|
## Training data |
|
|
|
- Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019) |
|
- Downloaded from PushShift, accepted comments only if upvote scores >= 3 |
|
- Originally collected for my GPT-NYC model in spring 2021 - [model](https://huggingface.co/monsoon-nlp/gpt-nyc) / [blog](https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d) |
|
|
|
## Training script |
|
|
|
Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch. |
|
|
|
``` |
|
git clone https://github.com/artidoro/qlora |
|
cd qlora |
|
|
|
pip3 install -r requirements.txt --quiet |
|
|
|
python3 qlora.py \ |
|
--model_name_or_path ../llama-2-7b-hf \ |
|
--use_auth \ |
|
--output_dir ../nyc-savvy-llama2-7b \ |
|
--logging_steps 10 \ |
|
--save_strategy steps \ |
|
--data_seed 42 \ |
|
--save_steps 500 \ |
|
--save_total_limit 40 \ |
|
--dataloader_num_workers 1 \ |
|
--group_by_length False \ |
|
--logging_strategy steps \ |
|
--remove_unused_columns False \ |
|
--do_train \ |
|
--num_train_epochs 1 \ |
|
--lora_r 64 \ |
|
--lora_alpha 16 \ |
|
--lora_modules all \ |
|
--double_quant \ |
|
--quant_type nf4 \ |
|
--bf16 \ |
|
--bits 4 \ |
|
--warmup_ratio 0.03 \ |
|
--lr_scheduler_type constant \ |
|
--gradient_checkpointing \ |
|
--dataset /content/gpt_nyc.jsonl \ |
|
--dataset_format oasst1 \ |
|
--source_max_len 16 \ |
|
--target_max_len 512 \ |
|
--per_device_train_batch_size 1 \ |
|
--gradient_accumulation_steps 16 \ |
|
--max_steps 760 \ |
|
--learning_rate 0.0002 \ |
|
--adam_beta2 0.999 \ |
|
--max_grad_norm 0.3 \ |
|
--lora_dropout 0.1 \ |
|
--weight_decay 0.0 \ |
|
--seed 0 \ |
|
``` |
|
|
|
## Merging it back |
|
|
|
What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script. |
|
|
|
Two options for merging: |
|
- The included `peftmerger.py` script merges the adapter and saves the model. |
|
- Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU. |
|
|
|
## Testing that the model is NYC-savvy |
|
|
|
You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo: |
|
|
|
```python |
|
m = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") |
|
tok = LlamaTokenizer.from_pretrained(model_name) |
|
|
|
messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n" |
|
messages += "### Human: What museums should I visit? - My kids are aged 12 and 5" |
|
messages += "### Assistant: " |
|
|
|
input_ids = tok(messages, return_tensors="pt").input_ids |
|
|
|
# ... |
|
|
|
temperature = 0.7 |
|
top_p = 0.9 |
|
top_k = 0 |
|
repetition_penalty = 1.1 |
|
|
|
op = m.generate( |
|
input_ids=input_ids, |
|
max_new_tokens=100, |
|
temperature=temperature, |
|
do_sample=temperature > 0.0, |
|
top_p=top_p, |
|
top_k=top_k, |
|
repetition_penalty=repetition_penalty, |
|
stopping_criteria=StoppingCriteriaList([stop]), |
|
) |
|
for line in op: |
|
print(tok.decode(line)) |
|
``` |
|
|