Nikity commited on
Commit
025ab01
·
verified ·
1 Parent(s): 4c3db4c

fix mistake

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -122,7 +122,7 @@ The model comes in two versions:
122
 
123
  The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
124
 
125
- *Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is closer to 140 million.*
126
 
127
  ## 📊 Evaluation
128
 
 
122
 
123
  The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
124
 
125
+ *Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is 127.17 million.*
126
 
127
  ## 📊 Evaluation
128