silence09 commited on
Commit
41566da
·
verified ·
1 Parent(s): 6da791b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -7,6 +7,11 @@ base_model:
7
 
8
  This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **2-layer version** of Deepseek R1 with randomly initialized weights and smaller dimensions.
9
 
 
 
 
 
 
10
  ## Model Structure
11
  The three hidden layers consist of:
12
  - **A hidden layer: MLA + Dense MLP**
@@ -25,11 +30,6 @@ The difference between this model and the original **Deepseek R1** is shown belo
25
  }
26
  ```
27
 
28
- ## Purpose
29
- The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
30
-
31
- The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware.
32
-
33
  ## Usage
34
 
35
  ```python
 
7
 
8
  This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **2-layer version** of Deepseek R1 with randomly initialized weights and smaller dimensions.
9
 
10
+ ## Purpose
11
+ The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
12
+
13
+ The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware.
14
+
15
  ## Model Structure
16
  The three hidden layers consist of:
17
  - **A hidden layer: MLA + Dense MLP**
 
30
  }
31
  ```
32
 
 
 
 
 
 
33
  ## Usage
34
 
35
  ```python