lemonilia
/

Mistral-Small-24B-Instruct-2501-GGUF-XXXL

Model card Files Files and versions Community

lemonilia commited on Mar 17

Commit

bd065d7

·

verified ·

1 Parent(s): 93ccc51

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,6 +9,6 @@ Following quantization suggestions in section 6.2 in the [Llama-3 paper](https:/
 - `token_embd` and `output` are in F16 precision;
 - The attention layers are in F16 precision;
 - The entirety of the first and final transformer layers are in F16 precision;
-- The feed-forward network (FFN) layers are in low precision.
 For the same total model size, perplexity values might not be favorable compared to more uniformly quantized models, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks. Your mileage may vary.

 - `token_embd` and `output` are in F16 precision;
 - The attention layers are in F16 precision;
 - The entirety of the first and final transformer layers are in F16 precision;
+- The feed-forward network (FFN) layers are in **low** precision.
 For the same total model size, perplexity values might not be favorable compared to more uniformly quantized models, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks. Your mileage may vary.