lemonilia commited on
Commit
bd065d7
·
verified ·
1 Parent(s): 93ccc51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,6 +9,6 @@ Following quantization suggestions in section 6.2 in the [Llama-3 paper](https:/
9
  - `token_embd` and `output` are in F16 precision;
10
  - The attention layers are in F16 precision;
11
  - The entirety of the first and final transformer layers are in F16 precision;
12
- - The feed-forward network (FFN) layers are in low precision.
13
 
14
  For the same total model size, perplexity values might not be favorable compared to more uniformly quantized models, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks. Your mileage may vary.
 
9
  - `token_embd` and `output` are in F16 precision;
10
  - The attention layers are in F16 precision;
11
  - The entirety of the first and final transformer layers are in F16 precision;
12
+ - The feed-forward network (FFN) layers are in **low** precision.
13
 
14
  For the same total model size, perplexity values might not be favorable compared to more uniformly quantized models, but supposedly this quantization scheme might help with real-world long-context performance and complex tasks. Your mileage may vary.