silence09 commited on
Commit
dc69c52
·
verified ·
1 Parent(s): c449881

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -1,12 +1,17 @@
 
 
 
 
 
1
  # Lightweight Deepseek R1 (3 Hidden Layers Version)
2
 
3
  This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
4
 
5
  ## Model Structure
6
  The three hidden layers consist of:
7
- - **A hidden layer using Dense MLP**
8
- - **A hidden layer using MoE (Mixture of Experts) as MLP part**
9
- - **A MTP (Multi-Token Pretraining) layer**
10
 
11
  ## Purpose
12
  The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
@@ -38,5 +43,4 @@ messages.append({"role": "assistant", "content": completion})
38
  ```
39
 
40
  ## More Info
41
- It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)
42
-
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1
5
+ ---
6
  # Lightweight Deepseek R1 (3 Hidden Layers Version)
7
 
8
  This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
9
 
10
  ## Model Structure
11
  The three hidden layers consist of:
12
+ - **A hidden layer: MLA + Dense MLP**
13
+ - **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
14
+ - **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference) **
15
 
16
  ## Purpose
17
  The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
 
43
  ```
44
 
45
  ## More Info
46
+ It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)