Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Lightweight Deepseek R1 (3 Hidden Layers Version)
|
2 |
|
3 |
This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
|
4 |
|
5 |
## Model Structure
|
6 |
The three hidden layers consist of:
|
7 |
-
- **A hidden layer
|
8 |
-
- **A hidden layer
|
9 |
-
- **A MTP (Multi-Token Pretraining) layer**
|
10 |
|
11 |
## Purpose
|
12 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|
@@ -38,5 +43,4 @@ messages.append({"role": "assistant", "content": completion})
|
|
38 |
```
|
39 |
|
40 |
## More Info
|
41 |
-
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)
|
42 |
-
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
base_model:
|
4 |
+
- deepseek-ai/DeepSeek-R1
|
5 |
+
---
|
6 |
# Lightweight Deepseek R1 (3 Hidden Layers Version)
|
7 |
|
8 |
This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
|
9 |
|
10 |
## Model Structure
|
11 |
The three hidden layers consist of:
|
12 |
+
- **A hidden layer: MLA + Dense MLP**
|
13 |
+
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
|
14 |
+
- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference) **
|
15 |
|
16 |
## Purpose
|
17 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|
|
|
43 |
```
|
44 |
|
45 |
## More Info
|
46 |
+
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)
|
|