silence09
/

DeepSeek-R1-Small-2layers

Model card Files Files and versions

silence09 commited on Feb 7

Commit

41566da

·

verified ·

1 Parent(s): 6da791b

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -7,6 +7,11 @@ base_model:
 This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **2-layer version** of Deepseek R1 with randomly initialized weights and smaller dimensions.
 ## Model Structure
 The three hidden layers consist of:
 - **A hidden layer: MLA + Dense MLP**
@@ -25,11 +30,6 @@ The difference between this model and the original **Deepseek R1** is shown belo
 }
 ```
-## Purpose
-The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
-The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware.
 ## Usage
 ```python

 This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **2-layer version** of Deepseek R1 with randomly initialized weights and smaller dimensions.
+## Purpose
+The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
+The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware.
 ## Model Structure
 The three hidden layers consist of:
 - **A hidden layer: MLA + Dense MLP**
 }
 ```
 ## Usage
 ```python