Update README.md

17e8d7f about 1 year ago

5.12 kB

	---
	license: mit
	datasets:
	- monsoon-nlp/asknyc-chatassistant-format
	language:
	- en
	tags:
	- reddit
	- asknyc
	- nyc
	- llama2
	widget:
	- text: "### Human: where can I find a good BEC?### Assistant: "
	example_title: "Basic prompt"
	- text: "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n### Human: What museums should we visit? - My kids are aged 12 and 5. They love fish### Assistant: "
	example_title: "Assistant prompt"
	---

	# nyc-savvy-llama2-7b

	Essentials:
	- Based on LLaMa2-7b-hf (version 2, 7B params)
	- Used [QLoRA](https://github.com/artidoro/qlora/blob/main/qlora.py) to fine-tune on [13k rows of /r/AskNYC](https://huggingface.co/datasets/monsoon-nlp/asknyc-chatassistant-format) formatted as Human/Assistant exchanges
	- Released [the adapter weights](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b)
	- Merged [quantized-then-dequantized LLaMa2](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) and the adapter weights to produce this full-sized model

	## Prompt options

	Here is the template used in training. Note it starts with "### Human: " (following space), the post title and content, then "### Assistant: " (no preceding space, yes following space).

	`### Human: Post title - post content### Assistant: `

	For example:

	`### Human: Where can I find a good bagel? - We are in Brooklyn### Assistant: Anywhere with fresh-baked bagels and lots of cream cheese options.`

	From [QLoRA's Gradio example](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing), it looks helpful to add a more assistant-like prompt, especially if you follow their lead for a chat format:

	```
	A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
	```

	## Training data

	- Collected one month of posts to /r/AskNYC from each year 2015-2019 (no content after July 2019)
	- Downloaded from PushShift, accepted comments only if upvote scores >= 3
	- Originally collected for my GPT-NYC model in spring 2021 - [model](https://huggingface.co/monsoon-nlp/gpt-nyc) / [blog](https://mapmeld.medium.com/gpt-nyc-part-1-9cb698b2e3d)

	## Training script

	Takes about 2 hours on CoLab once you get it right. You can only set max_steps for QLoRA, but I wanted to stop at 1 epoch.

	```
	git clone https://github.com/artidoro/qlora
	cd qlora

	pip3 install -r requirements.txt --quiet

	python3 qlora.py \
	--model_name_or_path ../llama-2-7b-hf \
	--use_auth \
	--output_dir ../nyc-savvy-llama2-7b \
	--logging_steps 10 \
	--save_strategy steps \
	--data_seed 42 \
	--save_steps 500 \
	--save_total_limit 40 \
	--dataloader_num_workers 1 \
	--group_by_length False \
	--logging_strategy steps \
	--remove_unused_columns False \
	--do_train \
	--num_train_epochs 1 \
	--lora_r 64 \
	--lora_alpha 16 \
	--lora_modules all \
	--double_quant \
	--quant_type nf4 \
	--bf16 \
	--bits 4 \
	--warmup_ratio 0.03 \
	--lr_scheduler_type constant \
	--gradient_checkpointing \
	--dataset /content/gpt_nyc.jsonl \
	--dataset_format oasst1 \
	--source_max_len 16 \
	--target_max_len 512 \
	--per_device_train_batch_size 1 \
	--gradient_accumulation_steps 16 \
	--max_steps 760 \
	--learning_rate 0.0002 \
	--adam_beta2 0.999 \
	--max_grad_norm 0.3 \
	--lora_dropout 0.1 \
	--weight_decay 0.0 \
	--seed 0 \
	```

	## Merging it back

	What you get in the `output_dir` is an adapter model. [Here's ours](https://huggingface.co/monsoon-nlp/nyc-savvy-llama2-7b-lora-adapter/). Cool, but not as easy to drop into their script.

	Two options for merging:
	- The included `peftmerger.py` script merges the adapter and saves the model.
	- Chris Hayduk produced a script to [quantize then de-quantize](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) the base model before merging a QLoRA adapter. This requires bitsandbytes and a GPU.

	## Testing that the model is NYC-savvy

	You might wonder if the model successfully learned anything about NYC or is the same old LLaMa2. With your prompt not adding clues, try this from the `pefttester.py` script in this repo:

	```python
	m = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
	tok = LlamaTokenizer.from_pretrained(model_name)

	messages = "A chat between a curious human and an assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
	messages += "### Human: What museums should I visit? - My kids are aged 12 and 5"
	messages += "### Assistant: "

	input_ids = tok(messages, return_tensors="pt").input_ids

	# ...

	temperature = 0.7
	top_p = 0.9
	top_k = 0
	repetition_penalty = 1.1

	op = m.generate(
	input_ids=input_ids,
	max_new_tokens=100,
	temperature=temperature,
	do_sample=temperature > 0.0,
	top_p=top_p,
	top_k=top_k,
	repetition_penalty=repetition_penalty,
	stopping_criteria=StoppingCriteriaList([stop]),
	)
	for line in op:
	print(tok.decode(line))
	```