Update README.md
Browse files
README.md
CHANGED
@@ -11,9 +11,9 @@ tags:
|
|
11 |
# 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g32_4bit 🔥
|
12 |
|
13 |
This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
|
14 |
-
It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of
|
15 |
smaller,
|
16 |
-
faster model with minimal performance degradation.
|
17 |
|
18 |
Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
|
19 |
|
|
|
11 |
# 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g32_4bit 🔥
|
12 |
|
13 |
This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
|
14 |
+
It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 32 resulting in a
|
15 |
smaller,
|
16 |
+
faster model with minimal performance degradation. The G128 variant used MSE loss in order to avoid performance degredation.
|
17 |
|
18 |
Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
|
19 |
|