Update README.md
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ LMDeploy supports the following NVIDIA GPU for W4A16 inference:
|
|
| 39 |
Before proceeding with the quantization and inference, please ensure that lmdeploy is installed.
|
| 40 |
|
| 41 |
```shell
|
| 42 |
-
pip install lmdeploy>=0.
|
| 43 |
```
|
| 44 |
|
| 45 |
This article comprises the following sections:
|
|
@@ -74,7 +74,7 @@ For more information about the pipeline parameters, please refer to [here](https
|
|
| 74 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
| 75 |
|
| 76 |
```shell
|
| 77 |
-
lmdeploy serve api_server OpenGVLab/InternVL-Chat-V1-5-AWQ --
|
| 78 |
```
|
| 79 |
|
| 80 |
To use the OpenAI-style interface, you need to install OpenAI:
|
|
|
|
| 39 |
Before proceeding with the quantization and inference, please ensure that lmdeploy is installed.
|
| 40 |
|
| 41 |
```shell
|
| 42 |
+
pip install lmdeploy>=0.6.4
|
| 43 |
```
|
| 44 |
|
| 45 |
This article comprises the following sections:
|
|
|
|
| 74 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
| 75 |
|
| 76 |
```shell
|
| 77 |
+
lmdeploy serve api_server OpenGVLab/InternVL-Chat-V1-5-AWQ --server-port 23333 --model-format awq
|
| 78 |
```
|
| 79 |
|
| 80 |
To use the OpenAI-style interface, you need to install OpenAI:
|