Nikity commited on
Commit
fb1174c
·
verified ·
1 Parent(s): 952679b

fix mistake

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -36,7 +36,7 @@ The model comes in two versions:
36
 
37
  The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
38
 
39
- *Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is closer to 140 million.*
40
 
41
  ## 📊 Evaluation
42
 
 
36
 
37
  The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
38
 
39
+ *Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is 127.17 million.*
40
 
41
  ## 📊 Evaluation
42