fix mistake
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ The model comes in two versions:
|
|
36 |
|
37 |
The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
|
38 |
|
39 |
-
*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is
|
40 |
|
41 |
## 📊 Evaluation
|
42 |
|
|
|
36 |
|
37 |
The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
|
38 |
|
39 |
+
*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is 127.17 million.*
|
40 |
|
41 |
## 📊 Evaluation
|
42 |
|