ConfidentialMind
/

Rombos-LLM-V2.6-Qwen-14b_GPTQ_G32_4bit_MSE

Text Generation

confidentialmind

mistral-small-24b

4-bit precision

Model card Files Files and versions Community

JustJaro commited on Feb 24

Commit

6a61639

·

verified ·

1 Parent(s): 9252c6c

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -15,6 +15,8 @@ It leverages the open-source GPTQModel quantization to achieve 4-bit precision w
 smaller,
 faster model with minimal performance degradation.
 Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
 *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.

 smaller,
 faster model with minimal performance degradation.
+NOTE: High perplexity, maybe due to MSE. Non-MSE quant either present or coming.
 Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
 *Note* `batch_size` is set quite high as the model is small, you may need to adjust this to your GPU VRAM.