Kunger
/

Sakura-13B-Qwen2beta-v0.9-4bit-GS64-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Kunger commited on Mar 20, 2024

Commit

ce71c3a

·

verified ·

1 Parent(s): 0b47031

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,4 +8,4 @@ license: cc-by-nc-sa-4.0
 GroupSize=64
-vLLM双卡推理不兼容AWQ，查ISSUE说好像量化时GroupSize设置为64可以解决。


8
9	GroupSize=64
10
11	+ 适用于Kaggle双卡推理。