add dstack section (#1612) [skip ci]
Browse files* add dstack section
* chore: lint
---------
Co-authored-by: Wing Lian <[email protected]>
    	
        README.md
    CHANGED
    
    | @@ -34,6 +34,7 @@ Features: | |
| 34 | 
             
              - [Mac](#mac)
         | 
| 35 | 
             
              - [Google Colab](#google-colab)
         | 
| 36 | 
             
              - [Launching on public clouds via SkyPilot](#launching-on-public-clouds-via-skypilot)
         | 
|  | |
| 37 | 
             
            - [Dataset](#dataset)
         | 
| 38 | 
             
            - [Config](#config)
         | 
| 39 | 
             
              - [Train](#train)
         | 
| @@ -292,6 +293,42 @@ HF_TOKEN=xx sky launch axolotl.yaml --env HF_TOKEN | |
| 292 | 
             
            HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET
         | 
| 293 | 
             
            ```
         | 
| 294 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 295 | 
             
            ### Dataset
         | 
| 296 |  | 
| 297 | 
             
            Axolotl supports a variety of dataset formats.  It is recommended to use a JSONL.  The schema of the JSONL depends upon the task and the prompt template you wish to use.  Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
         | 
|  | |
| 34 | 
             
              - [Mac](#mac)
         | 
| 35 | 
             
              - [Google Colab](#google-colab)
         | 
| 36 | 
             
              - [Launching on public clouds via SkyPilot](#launching-on-public-clouds-via-skypilot)
         | 
| 37 | 
            +
              - [Launching on public clouds via dstack](#launching-on-public-clouds-via-dstack)
         | 
| 38 | 
             
            - [Dataset](#dataset)
         | 
| 39 | 
             
            - [Config](#config)
         | 
| 40 | 
             
              - [Train](#train)
         | 
|  | |
| 293 | 
             
            HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET
         | 
| 294 | 
             
            ```
         | 
| 295 |  | 
| 296 | 
            +
            #### Launching on public clouds via dstack
         | 
| 297 | 
            +
            To launch on GPU instance (both on-demand and spot instances) on public clouds (GCP, AWS, Azure, Lambda Labs, TensorDock, Vast.ai, and CUDO), you can use [dstack](https://dstack.ai/).
         | 
| 298 | 
            +
             | 
| 299 | 
            +
            Write a job description in YAML as below:
         | 
| 300 | 
            +
             | 
| 301 | 
            +
            ```yaml
         | 
| 302 | 
            +
            # dstack.yaml
         | 
| 303 | 
            +
            type: task
         | 
| 304 | 
            +
             | 
| 305 | 
            +
            image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
         | 
| 306 | 
            +
             | 
| 307 | 
            +
            env:
         | 
| 308 | 
            +
              - HUGGING_FACE_HUB_TOKEN
         | 
| 309 | 
            +
              - WANDB_API_KEY
         | 
| 310 | 
            +
             | 
| 311 | 
            +
            commands:
         | 
| 312 | 
            +
              - accelerate launch -m axolotl.cli.train config.yaml
         | 
| 313 | 
            +
             | 
| 314 | 
            +
            ports:
         | 
| 315 | 
            +
              - 6006
         | 
| 316 | 
            +
             | 
| 317 | 
            +
            resources:
         | 
| 318 | 
            +
              gpu:
         | 
| 319 | 
            +
                memory: 24GB..
         | 
| 320 | 
            +
                count: 2
         | 
| 321 | 
            +
            ```
         | 
| 322 | 
            +
             | 
| 323 | 
            +
            then, simply run the job with `dstack run` command. Append `--spot` option if you want spot instance. `dstack run` command will show you the instance with cheapest price across multi cloud services:
         | 
| 324 | 
            +
             | 
| 325 | 
            +
            ```bash
         | 
| 326 | 
            +
            pip install dstack
         | 
| 327 | 
            +
            HUGGING_FACE_HUB_TOKEN=xxx WANDB_API_KEY=xxx dstack run . -f dstack.yaml # --spot
         | 
| 328 | 
            +
            ```
         | 
| 329 | 
            +
             | 
| 330 | 
            +
            For further and fine-grained use cases, please refer to the official [dstack documents](https://dstack.ai/docs/) and the detailed description of [axolotl example](https://github.com/dstackai/dstack/tree/master/examples/fine-tuning/axolotl) on the official repository.
         | 
| 331 | 
            +
             | 
| 332 | 
             
            ### Dataset
         | 
| 333 |  | 
| 334 | 
             
            Axolotl supports a variety of dataset formats.  It is recommended to use a JSONL.  The schema of the JSONL depends upon the task and the prompt template you wish to use.  Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
         | 
 
		