DeepSeekDDM commited on
Commit
108e1e0
·
verified ·
1 Parent(s): 4c1f24c

fix the calculation of act params

Browse files
Files changed (1) hide show
  1. README_WEIGHTS.md +1 -1
README_WEIGHTS.md CHANGED
@@ -18,7 +18,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
18
  - Input/output embedding layers and a complete set of 61 Transformer hidden layers.
19
  - **Parameter Count**:
20
  - Total parameters: **671B**
21
- - Activation parameters: **36.7B** (including 0.9B for Embedding and 0.9B for the output Head).
22
 
23
  #### Structural Details
24
 
 
18
  - Input/output embedding layers and a complete set of 61 Transformer hidden layers.
19
  - **Parameter Count**:
20
  - Total parameters: **671B**
21
+ - Activation parameters: **36.6B**.
22
 
23
  #### Structural Details
24