Update README.md
Browse files
README.md
CHANGED
@@ -72,16 +72,18 @@ print(tokenizer.decode(output))
|
|
72 |
- **Model type:** Transformer-based Language Model
|
73 |
- **Total seen tokens:** 2.1T tokens
|
74 |
|
75 |
-
|Params|Layers|Hidden size|Heads|Context length|Embedding parameters|Non-embedding parameters|
|
76 |
-
|
77 |
-
|150M|12|512|8
|
78 |
-
|440M|16|1024|8
|
79 |
-
|980M|20|1536|8
|
80 |
-
|1.8b|24|2048|16
|
81 |
-
|
|
82 |
-
|
|
83 |
-
|
|
84 |
-
|
|
|
|
|
|
85 |
|
86 |
## Tokenizer
|
87 |
|
|
|
72 |
- **Model type:** Transformer-based Language Model
|
73 |
- **Total seen tokens:** 2.1T tokens
|
74 |
|
75 |
+
|Params|Layers|Hidden size|Heads|Routed Experts|Activated Experts|Context length|Embedding parameters|Non-embedding parameters|Activated parameters|Total parameters|
|
76 |
+
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
77 |
+
|150M|12|512|8|-|-|4096|101,874,688|50,344,448|152,219,136|152,219,136|
|
78 |
+
|440M|16|1024|8|-|-|4096|203,749,376|243,303,424|447,052,800|447,052,800|
|
79 |
+
|980M|20|1536|8|-|-|4096|305,624,064|684,258,816|989,882,880|989,882,880|
|
80 |
+
|1.8b|24|2048|16|-|-|4096|407,498,752|1,459,718,144|1,867,216,896|1,867,216,896|
|
81 |
+
|8x1.8b|24|2048|16|8|2|4096|407,498,752|8,858,863,616|2,924,279,808|9,266,362,368|9,266,362,368|
|
82 |
+
|3.7b|28|3072|24|-|-|4096|611,248,128|3,171,068,928|3,782,317,056|3,782,317,056|
|
83 |
+
|7.2b|32|4096|32|-|-|4096|814,997,504|6,476,271,616|7,291,269,120|7,291,269,120|
|
84 |
+
|13b|40|5120|40|-|-|4096|1,018,746,880|12,688,184,320|13,706,931,200|13,706,931,200|
|
85 |
+
|8x13b|40|5120|40|8|2|4096|1,018,746,880|72,144,081,920|22,200,806,400|73,162,828,800|
|
86 |
+
|172b|96|12288|96|-|-|4096|2,444,992,512|169,947,181,056|172,392,173,568|172,392,173,568|
|
87 |
|
88 |
## Tokenizer
|
89 |
|