lemonilia commited on
Commit
13a4bd5
·
verified ·
1 Parent(s): 2d964ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -11,7 +11,7 @@ Following suggestions from section 6.2 in the [Llama-3 paper](https://arxiv.org/
11
  - The entirety of the first and final transformer layers are in BF16 precision;
12
  - Intermediate feed-forward network (FFN) layers are in _uniformly **low**_ precision.
13
 
14
- For the same total model size, computed perplexity values _do not appear to better_ than smaller standard GGUF quantizations, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks while keeping size limited. Your mileage may vary.
15
 
16
  ## KL divergence testing
17
  Computed using `llama-perplexity` on a custom text file over 4 chunks, n_ctx=2048, batch_size=2048, n_seq=1. Some results are strange but remained the same after repeating them several times. They might have been the result of using a short test file.
 
11
  - The entirety of the first and final transformer layers are in BF16 precision;
12
  - Intermediate feed-forward network (FFN) layers are in _uniformly **low**_ precision.
13
 
14
+ For the same total model size, computed perplexity values _do not appear to be better_ than smaller standard GGUF quantizations, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks while keeping size limited. Your mileage may vary.
15
 
16
  ## KL divergence testing
17
  Computed using `llama-perplexity` on a custom text file over 4 chunks, n_ctx=2048, batch_size=2048, n_seq=1. Some results are strange but remained the same after repeating them several times. They might have been the result of using a short test file.