Nikity
/

lille-130m-base

Text Generation

Model card Files Files and versions

Nikity commited on 11 days ago

Commit

fb1174c

·

verified ·

1 Parent(s): 952679b

fix mistake

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ The model comes in two versions:
 The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
-*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is closer to 140 million.*
 ## 📊 Evaluation

 The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
+*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is 127.17 million.*
 ## 📊 Evaluation