shuyuej
/

Llama-3.3-70B-Instruct-GPTQ

4-bit precision

Model card Files Files and versions Community

shuyuej commited on 22 days ago

Commit

3a7f7f7

·

verified ·

1 Parent(s): 0ad7673

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -79,7 +79,11 @@ Please check [here](https://docs.vllm.ai/en/stable/models/engine_args.html) if y
 If you would like to deploy your LoRA adapter, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/lora.html#serving-lora-adapters) for a detailed guide.<br>
 It provides step-by-step instructions on how to serve LoRA adapters effectively in a vLLM environment.<br>
 **We have also shared our trained LoRA adapter** [here](https://huggingface.co/shuyuej/Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ). Please download it manually if needed.
 ```shell
 vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --quantization gptq \
@@ -90,7 +94,7 @@ vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --pipeline-parallel-size 4 \
     --api-key token-abc123 \
     --enable-lora \
-    --lora-modules adapter=checkpoint-18640
 ```
 Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.

 If you would like to deploy your LoRA adapter, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/lora.html#serving-lora-adapters) for a detailed guide.<br>
 It provides step-by-step instructions on how to serve LoRA adapters effectively in a vLLM environment.<br>
 **We have also shared our trained LoRA adapter** [here](https://huggingface.co/shuyuej/Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ). Please download it manually if needed.
+```shell
+git clone https://huggingface.co/shuyuej/Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ
+```
+Then, use the vLLM to serve the base model with the LoRA adapter by including the `--enable-lora` flag and specifying `--lora-modules`:
 ```shell
 vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --quantization gptq \
     --pipeline-parallel-size 4 \
     --api-key token-abc123 \
     --enable-lora \
+    --lora-modules adapter=Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ/checkpoint-18640
 ```
 Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.