File size: 4,405 Bytes
6e31444
567ab45
 
 
 
 
 
 
6e31444
 
567ab45
6e31444
 
 
567ab45
 
 
6e31444
 
567ab45
6e31444
567ab45
 
6e31444
567ab45
 
 
 
 
6e31444
567ab45
6e31444
567ab45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e31444
567ab45
6e31444
567ab45
 
6e31444
567ab45
 
 
 
 
 
 
 
 
6e31444
567ab45
6e31444
567ab45
 
 
 
6e31444
567ab45
6e31444
567ab45
 
 
 
6e31444
567ab45
6e31444
567ab45
 
 
 
 
 
6e31444
567ab45
6e31444
567ab45
 
6e31444
567ab45
 
6e31444
567ab45
6e31444
567ab45
 
6e31444
567ab45
 
 
 
 
6e31444
567ab45
6e31444
567ab45
 
 
 
 
6e31444
567ab45
6e31444
567ab45
 
 
6e31444
567ab45
6e31444
567ab45
6e31444
567ab45
6e31444
567ab45
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
license: mit
datasets:
- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
library_name: transformers
tags:
- fine-tuned
- unsloth
- trl
- grpo
- deepseek
- gsm8k
- reasoning
---

# **DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation**

## **Model Overview**
This model is a fine-tuned version of **DeepSeek-R1-Distill-Qwen-1.5B**, trained on the **OpenAI GSM8K dataset**, augmented with **Chain-of-Thought (CoT) reasoning** using **DeepSeek-V3**. The fine-tuning process enhances the model’s **mathematical problem-solving abilities**, allowing it to provide **step-by-step solutions** with deeper reasoning.

### **πŸ”Ή Key Features**
- **Base Model**: DeepSeek-R1-Distill-Qwen-1.5B  
- **Fine-Tuned On**: GSM8K dataset with DeepSeek-V3-enhanced reasoning  
- **Improved Mathematical Reasoning**: Generates detailed step-by-step CoT explanations  
- **Optimized for GRPO Training**: Trained using `trl` and `unsloth` for efficient fine-tuning  

---

## **πŸ“Š Dataset & Training Details**
- **Dataset**: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`
  - **8K train samples**, **1K test samples**
  - Contains **question**, **answer**, and **CoT reasoning**
- **Training Methodology**:
  - Used **Guided Reinforcement Policy Optimization (GRPO)** via `trl`
  - Applied **gradient accumulation** to manage larger batch sizes
  - Integrated **DeepSeek-V3 augmentation** for enhanced logical reasoning
- **Fine-tuning Tools**:
  - **Unsloth** for memory-efficient Llama-based tuning
  - **Hugging Face Transformers** for model training

For those interested in replicating the fine-tuning process, I have shared an **updated Colab notebook** πŸ““:  
πŸ”— [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing)

You will need:  
βœ… Hugging Face Token  
βœ… Together.AI API Key  
βœ… Unsloth Package  

---

## **πŸš€ How to Run the Model (Mac via `llama.cpp`)**
Yes! You can run this model **locally on macOS** using `llama.cpp`.  

### **1️⃣ Install Homebrew (If Not Installed)**
```sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
Then add Homebrew to your PATH:
```sh
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
```

---

### **2️⃣ Install `llama.cpp`**
```sh
brew install llama.cpp
```

---

### **3️⃣ Run the Model with `llama-cli`**
```sh
llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0
```

---

### **4️⃣ Alternative: Run Locally via GGUF**
```sh
mkdir -p ~/llama_models && cd ~/llama_models
wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf
llama-cli -m ~/llama_models/Q8_0.gguf --interactive
```

---

## **πŸ“Œ How to Use Model via Python (`transformers`)**
You can load the model with **Hugging Face Transformers**:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

---

## **πŸ”¬ Expected Performance**
Compared to the base **DeepSeek-R1-Distill-Qwen-1.5B**, this fine-tuned model:
- Provides **more detailed Chain-of-Thought (CoT) explanations** for GSM8K problems.
- Improves **logical reasoning and step-by-step answer formulation**.
- Generates **clearer, more structured solutions**, making it **ideal for educational use**.

---

## **πŸ—‚ Model Hosting & License**
πŸ“Œ **Model on Hugging Face Hub**:  
πŸ‘‰ **[eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)**  

πŸ“œ **License**: MIT License – Open for modification and distribution.

---

If you have **feedback or ideas for improvement**, feel free to reach out! πŸš€πŸ”₯  

#AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning
```