silence09
/

DeepSeek-R1-3layers

Model card Files Files and versions Community

silence09 commited on Feb 7

Commit

dc69c52

·

verified ·

1 Parent(s): c449881

Update README.md

Files changed (1) hide show

README.md +9 -5

README.md CHANGED Viewed

@@ -1,12 +1,17 @@
 # Lightweight Deepseek R1 (3 Hidden Layers Version)
 This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
 ## Model Structure
 The three hidden layers consist of:
-- **A hidden layer using Dense MLP**
-- **A hidden layer using MoE (Mixture of Experts) as MLP part**
-- **A MTP (Multi-Token Pretraining) layer**
 ## Purpose
 The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
@@ -38,5 +43,4 @@ messages.append({"role": "assistant", "content": completion})
 ```
 ## More Info
-It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)

+---
+license: mit
+base_model:
+- deepseek-ai/DeepSeek-R1
+---
 # Lightweight Deepseek R1 (3 Hidden Layers Version)
 This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
 ## Model Structure
 The three hidden layers consist of:
+- **A hidden layer: MLA + Dense MLP**
+- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
+- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference) **
 ## Purpose
 The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
 ```
 ## More Info
+It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)