lemonilia
/

Mistral-Small-24B-Instruct-2501-GGUF-XXXL

Model card Files Files and versions Community

lemonilia commited on Mar 17

Commit

13a4bd5

·

verified ·

1 Parent(s): 2d964ae

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ Following suggestions from section 6.2 in the [Llama-3 paper](https://arxiv.org/
 - The entirety of the first and final transformer layers are in BF16 precision;
 - Intermediate feed-forward network (FFN) layers are in _uniformly **low**_ precision.
-For the same total model size, computed perplexity values _do not appear to better_ than smaller standard GGUF quantizations, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks while keeping size limited. Your mileage may vary.
 ## KL divergence testing
 Computed using `llama-perplexity` on a custom text file over 4 chunks, n_ctx=2048, batch_size=2048, n_seq=1. Some results are strange but remained the same after repeating them several times. They might have been the result of using a short test file.

 - The entirety of the first and final transformer layers are in BF16 precision;
 - Intermediate feed-forward network (FFN) layers are in _uniformly **low**_ precision.
+For the same total model size, computed perplexity values _do not appear to be better_ than smaller standard GGUF quantizations, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks while keeping size limited. Your mileage may vary.
 ## KL divergence testing
 Computed using `llama-perplexity` on a custom text file over 4 chunks, n_ctx=2048, batch_size=2048, n_seq=1. Some results are strange but remained the same after repeating them several times. They might have been the result of using a short test file.