ConfidentialMind
/

Mistral-Small-24B-Instruct-2501_GPTQ_G32_W4A16

Text Generation

confidentialmind

mistral-small-24b

4-bit precision

Model card Files Files and versions Community

JustJaro commited on Feb 23

Commit

6fb916d

·

verified ·

1 Parent(s): 8e63c09

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -11,9 +11,9 @@ tags:
 # 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g32_4bit 🔥
 This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
-It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 128 resulting in a
 smaller,
-faster model with minimal performance degradation.
 Ran on a single NVIDIA A100 GPU with 80GB of VRAM.

 # 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g32_4bit 🔥
 This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
+It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 32 resulting in a
 smaller,
+faster model with minimal performance degradation. The G128 variant used MSE loss in order to avoid performance degredation.
 Ran on a single NVIDIA A100 GPU with 80GB of VRAM.