File size: 4,405 Bytes

---
license: mit
datasets:
- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
library_name: transformers
tags:
- fine-tuned
- unsloth
- trl
- grpo
- deepseek
- gsm8k
- reasoning
---

# **DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation**

## **Model Overview**
This model is a fine-tuned version of **DeepSeek-R1-Distill-Qwen-1.5B**, trained on the **OpenAI GSM8K dataset**, augmented with **Chain-of-Thought (CoT) reasoning** using **DeepSeek-V3**. The fine-tuning process enhances the model’s **mathematical problem-solving abilities**, allowing it to provide **step-by-step solutions** with deeper reasoning.

### **🔹 Key Features**
- **Base Model**: DeepSeek-R1-Distill-Qwen-1.5B  
- **Fine-Tuned On**: GSM8K dataset with DeepSeek-V3-enhanced reasoning  
- **Improved Mathematical Reasoning**: Generates detailed step-by-step CoT explanations  
- **Optimized for GRPO Training**: Trained using `trl` and `unsloth` for efficient fine-tuning  

---

## **📊 Dataset & Training Details**
- **Dataset**: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`
  - **8K train samples**, **1K test samples**
  - Contains **question**, **answer**, and **CoT reasoning**
- **Training Methodology**:
  - Used **Guided Reinforcement Policy Optimization (GRPO)** via `trl`
  - Applied **gradient accumulation** to manage larger batch sizes
  - Integrated **DeepSeek-V3 augmentation** for enhanced logical reasoning
- **Fine-tuning Tools**:
  - **Unsloth** for memory-efficient Llama-based tuning
  - **Hugging Face Transformers** for model training

For those interested in replicating the fine-tuning process, I have shared an **updated Colab notebook** 📓:  
🔗 [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing)

You will need:  
✅ Hugging Face Token  
✅ Together.AI API Key  
✅ Unsloth Package  

---

## **🚀 How to Run the Model (Mac via `llama.cpp`)**
Yes! You can run this model **locally on macOS** using `llama.cpp`.  

### **1️⃣ Install Homebrew (If Not Installed)**
```sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
Then add Homebrew to your PATH:
```sh
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
```

---

### **2️⃣ Install `llama.cpp`**
```sh
brew install llama.cpp
```

---

### **3️⃣ Run the Model with `llama-cli`**
```sh
llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0
```

---

### **4️⃣ Alternative: Run Locally via GGUF**
```sh
mkdir -p ~/llama_models && cd ~/llama_models
wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf
llama-cli -m ~/llama_models/Q8_0.gguf --interactive
```

---

## **📌 How to Use Model via Python (`transformers`)**
You can load the model with **Hugging Face Transformers**:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

---

## **🔬 Expected Performance**
Compared to the base **DeepSeek-R1-Distill-Qwen-1.5B**, this fine-tuned model:
- Provides **more detailed Chain-of-Thought (CoT) explanations** for GSM8K problems.
- Improves **logical reasoning and step-by-step answer formulation**.
- Generates **clearer, more structured solutions**, making it **ideal for educational use**.

---

## **🗂 Model Hosting & License**
📌 **Model on Hugging Face Hub**:  
👉 **[eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)**  

📜 **License**: MIT License – Open for modification and distribution.

---

If you have **feedback or ideas for improvement**, feel free to reach out! 🚀🔥  

#AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning
```