Heng666
/

Breeze-7B-Instruct-v0_1-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

Heng666 commited on Feb 17, 2024

Commit

4aa303a

·

verified ·

1 Parent(s): 51b6f6f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -63,7 +63,7 @@ Models are released as sharded safetensors files.
 Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
 - When using vLLM as a server, pass the `--quantization awq` parameter, for example:
 ```shell
-python3 python -m vllm.entrypoints.api_server --model TheBloke/law-LLM-AWQ --quantization awq --dtype half
 ```
 Note: at the time of writing, vLLM has not yet done a new release with support for the `quantization` parameter.
 If you try the code below and get an error about `quantization` being unrecognised, please install vLLM from Github source.

 Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
 - When using vLLM as a server, pass the `--quantization awq` parameter, for example:
 ```shell
+python3 python -m vllm.entrypoints.api_server --model Heng666/Breeze-7B-Instruct-v0_1-AWQ --quantization awq --dtype half
 ```
 Note: at the time of writing, vLLM has not yet done a new release with support for the `quantization` parameter.
 If you try the code below and get an error about `quantization` being unrecognised, please install vLLM from Github source.