Update README.md
Browse files
README.md
CHANGED
|
@@ -13,4 +13,4 @@ AWQ of the DeepSeek V3 chat model.
|
|
| 13 |
|
| 14 |
This quant modified some of the model code to fix the overflow issue when using float16.
|
| 15 |
|
| 16 |
-
Tested on vLLM with 8x H100, inference speed 5 tokens
|
|
|
|
| 13 |
|
| 14 |
This quant modified some of the model code to fix the overflow issue when using float16.
|
| 15 |
|
| 16 |
+
Tested on vLLM with 8x H100, inference speed 5 tokens per second with batch size 1 and short prompt, 12 tokens per second when using `moe_wna16` kernel.
|