|
--- |
|
license: mit |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1 |
|
--- |
|
# LightWeight Deepseek R1 (2 Hidden Layers Version with Smaller Dimensions) |
|
|
|
This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **2-layer version** of Deepseek R1 with randomly initialized weights and smaller dimensions. |
|
|
|
## Purpose |
|
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run local quickly. |
|
|
|
The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware. |
|
|
|
## Model Structure |
|
The three hidden layers consist of: |
|
- **A hidden layer: MLA + Dense MLP** |
|
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP** |
|
|
|
The difference between this model and the original **Deepseek R1** is shown below: |
|
```json |
|
{ |
|
"first_k_dense_replace": 1, |
|
"intermediate_size": 1024, |
|
"n_routed_experts": 64, |
|
"num_experts_per_tok": 4, |
|
"moe_intermediate_size": 128, |
|
"num_hidden_layers": 2, |
|
"num_nextn_predict_layers": 0 |
|
} |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoConfig, AutoModelForCausalLM |
|
from transformers import AutoTokenizer |
|
import torch |
|
|
|
model = AutoModelForCausalLM.from_pretrained('silence09/DeepSeek-R1-Small-2layers', torch_dtype=torch.bfloat16).cuda() |
|
tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-R1-Small-2layers') |
|
|
|
prompt = "Who are u?" |
|
messages = [] |
|
messages.append({"role": "user", "content": prompt}) |
|
prompt_tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) |
|
generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(prompt_tokens, generated_ids) |
|
] |
|
completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(completion) |
|
messages.append({"role": "assistant", "content": completion}) |
|
|
|
``` |
|
|
|
## More Info |
|
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_small_2layers.py) |