shuyuej commited on
Commit
3a7f7f7
·
verified ·
1 Parent(s): 0ad7673

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -79,7 +79,11 @@ Please check [here](https://docs.vllm.ai/en/stable/models/engine_args.html) if y
79
  If you would like to deploy your LoRA adapter, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/lora.html#serving-lora-adapters) for a detailed guide.<br>
80
  It provides step-by-step instructions on how to serve LoRA adapters effectively in a vLLM environment.<br>
81
  **We have also shared our trained LoRA adapter** [here](https://huggingface.co/shuyuej/Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ). Please download it manually if needed.
 
 
 
82
 
 
83
  ```shell
84
  vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
85
  --quantization gptq \
@@ -90,7 +94,7 @@ vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
90
  --pipeline-parallel-size 4 \
91
  --api-key token-abc123 \
92
  --enable-lora \
93
- --lora-modules adapter=checkpoint-18640
94
  ```
95
 
96
  Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.
 
79
  If you would like to deploy your LoRA adapter, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/lora.html#serving-lora-adapters) for a detailed guide.<br>
80
  It provides step-by-step instructions on how to serve LoRA adapters effectively in a vLLM environment.<br>
81
  **We have also shared our trained LoRA adapter** [here](https://huggingface.co/shuyuej/Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ). Please download it manually if needed.
82
+ ```shell
83
+ git clone https://huggingface.co/shuyuej/Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ
84
+ ```
85
 
86
+ Then, use the vLLM to serve the base model with the LoRA adapter by including the `--enable-lora` flag and specifying `--lora-modules`:
87
  ```shell
88
  vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
89
  --quantization gptq \
 
94
  --pipeline-parallel-size 4 \
95
  --api-key token-abc123 \
96
  --enable-lora \
97
+ --lora-modules adapter=Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ/checkpoint-18640
98
  ```
99
 
100
  Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.