| # StableLM 2 | |
| This repository contains examples for training and processing using StableLM-2. It also includes a section to help you estimate the GPU requirements for your specific use case. | |
| ## Estimating GPU Requirements | |
| | type | deepspeed | batch size | context length | vRAM GPU (GBs) | | |
| |---------------|-----------|------------|----------------|----------------| | |
| | full finetune | N/A | 1 | 4096 | ~21.5GBs | | |
| | full finetune | zero2 | 1 | 4096 | ~20GBs | | |
| | lora | N/A | 1 | 4096 | ~16.6GBs | | |
| The above are estimates and might differ slight depending on the setup for example whether you pack your sequence lengths or not (the above assumes you do to length 4096). | |
| This blog post from Hamel Husain was a great resource for estimating these numbers: https://hamel.dev/notes/llm/03_estimating_vram.html | |
| ## Training | |
| We have example scripts here for both full finetuning and lora using the popular alpaca dataset: | |
| ```shell | |
| # preprocess the dataset | |
| CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/stablelm-2/1.6b/lora.yml | |
| ``` | |
| Single GPU Training: | |
| ```shell | |
| python -m axolotl.cli.train examples/stablelm-2/fft.yml --deepspeed deepspeed_configs/zero2.json | |
| # OR | |
| python -m axolotl.cli.train examples/stablelm-2/1.6b/lora.yml | |
| ``` | |
| Multinode GPU Training with `accelerate`: | |
| ```shell | |
| # make sure you've configured accelerate properly | |
| accelerate launch -m axolotl.cli.train examples/stablelm-2/1.6b/fft.yml --deepspeed deepspeed_configs/zero2.json | |
| ``` | |