Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ Following suggestions from section 6.2 in the [Llama-3 paper](https://arxiv.org/
|
|
11 |
- The entirety of the first and final transformer layers are in BF16 precision;
|
12 |
- Intermediate feed-forward network (FFN) layers are in _uniformly **low**_ precision.
|
13 |
|
14 |
-
For the same total model size, computed perplexity values _do not appear to better_ than smaller standard GGUF quantizations, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks while keeping size limited. Your mileage may vary.
|
15 |
|
16 |
## KL divergence testing
|
17 |
Computed using `llama-perplexity` on a custom text file over 4 chunks, n_ctx=2048, batch_size=2048, n_seq=1. Some results are strange but remained the same after repeating them several times. They might have been the result of using a short test file.
|
|
|
11 |
- The entirety of the first and final transformer layers are in BF16 precision;
|
12 |
- Intermediate feed-forward network (FFN) layers are in _uniformly **low**_ precision.
|
13 |
|
14 |
+
For the same total model size, computed perplexity values _do not appear to be better_ than smaller standard GGUF quantizations, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks while keeping size limited. Your mileage may vary.
|
15 |
|
16 |
## KL divergence testing
|
17 |
Computed using `llama-perplexity` on a custom text file over 4 chunks, n_ctx=2048, batch_size=2048, n_seq=1. Some results are strange but remained the same after repeating them several times. They might have been the result of using a short test file.
|