Update README.md
Browse files
README.md
CHANGED
@@ -63,7 +63,7 @@ Models are released as sharded safetensors files.
|
|
63 |
Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
|
64 |
- When using vLLM as a server, pass the `--quantization awq` parameter, for example:
|
65 |
```shell
|
66 |
-
python3 python -m vllm.entrypoints.api_server --model
|
67 |
```
|
68 |
Note: at the time of writing, vLLM has not yet done a new release with support for the `quantization` parameter.
|
69 |
If you try the code below and get an error about `quantization` being unrecognised, please install vLLM from Github source.
|
|
|
63 |
Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
|
64 |
- When using vLLM as a server, pass the `--quantization awq` parameter, for example:
|
65 |
```shell
|
66 |
+
python3 python -m vllm.entrypoints.api_server --model Heng666/Breeze-7B-Instruct-v0_1-AWQ --quantization awq --dtype half
|
67 |
```
|
68 |
Note: at the time of writing, vLLM has not yet done a new release with support for the `quantization` parameter.
|
69 |
If you try the code below and get an error about `quantization` being unrecognised, please install vLLM from Github source.
|