|
Slightly modified mpt-30b, which has some updates to allow gradient checkpointing/etc., to be compatible with qlora training code. |
|
|
|
Original model: https://huggingface.co/mosaicml/mpt-30b |
|
|
|
My fork of qlora with mpt-30b support: https://github.com/jondurbin/qlora |
|
|
|
Differences in the qlora scripts: |
|
|
|
- requires adding `--mpt True` for mpt-based models |
|
- uses `--num_train_epochs` instead of `--max_steps` |
|
- uses airoboros prompt format (mostly 1:1 with vicuna) rather than alpaca, and expects an input file in JSONL format with "instruction" and "response" |
|
|
|
__I think there's a bug in gradient accumulation, so if you try this, maybe set gradient accumulation steps to 1__ |
|
|
|
*my first attempts used batch size 6, with gradient accumulation steps 16, but results of three epochs with gradient accumulation vs without were quite a bit worse* |
|
|
|
__5 epochs seemed to achieve the best results, but YMMV__ |
|
|
|
Full example of tuning (used for airoboros-mpt-30b-gpt4-1.4): |
|
|
|
``` |
|
source /workspace/venv/bin/activate |
|
export PYTHONPATH=./mpt-30b |
|
export WANDB_API_KEY=[redacted] |
|
export WANDB_PROJECT=airoboros-mpt-30b-gpt4-1.4 |
|
|
|
python qlora.py \ |
|
--model_name_or_path ./mpt-30b \ |
|
--output_dir ./$WANDB_PROJECT-checkpoints \ |
|
--num_train_epochs 5 \ |
|
--logging_steps 1 \ |
|
--save_strategy steps \ |
|
--data_seed 11422 \ |
|
--save_steps 100 \ |
|
--save_total_limit 3 \ |
|
--evaluation_strategy "no" \ |
|
--eval_dataset_size 2 \ |
|
--max_new_tokens 8192 \ |
|
--dataloader_num_workers 3 \ |
|
--logging_strategy steps \ |
|
--remove_unused_columns False \ |
|
--do_train \ |
|
--lora_r 64 \ |
|
--lora_alpha 16 \ |
|
--lora_modules all \ |
|
--double_quant \ |
|
--quant_type nf4 \ |
|
--bf16 \ |
|
--bits 4 \ |
|
--warmup_ratio 0.03 \ |
|
--lr_scheduler_type constant \ |
|
--dataset ./instructions.jsonl \ |
|
--dataset_format airoboros \ |
|
--model_max_len 8192 \ |
|
--gradient_checkpointing \ |
|
--per_device_train_batch_size 6 \ |
|
--gradient_accumulation_steps 1 \ |
|
--learning_rate 0.0001 \ |
|
--adam_beta2 0.999 \ |
|
--max_grad_norm 0.3 \ |
|
--lora_dropout 0.05 \ |
|
--weight_decay 0.0 \ |
|
--seed 11422 \ |
|
--trust_remote_code \ |
|
--mpt True \ |
|
--report_to wandb |
|
``` |
|
|
|
### Merged model |
|
|
|
Run the `merge_weights.py` script in the qlora repo: https://github.com/jondurbin/qlora/blob/main/merge_weights.py |
|
|
|
Then, copy all of the original python files from the mpt-30b repo into your output directory: https://huggingface.co/mosaicml/mpt-30b/tree/main |