eagle0504's picture
Update README.md
567ab45 verified
---
license: mit
datasets:
- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
library_name: transformers
tags:
- fine-tuned
- unsloth
- trl
- grpo
- deepseek
- gsm8k
- reasoning
---
# **DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation**
## **Model Overview**
This model is a fine-tuned version of **DeepSeek-R1-Distill-Qwen-1.5B**, trained on the **OpenAI GSM8K dataset**, augmented with **Chain-of-Thought (CoT) reasoning** using **DeepSeek-V3**. The fine-tuning process enhances the model’s **mathematical problem-solving abilities**, allowing it to provide **step-by-step solutions** with deeper reasoning.
### **πŸ”Ή Key Features**
- **Base Model**: DeepSeek-R1-Distill-Qwen-1.5B
- **Fine-Tuned On**: GSM8K dataset with DeepSeek-V3-enhanced reasoning
- **Improved Mathematical Reasoning**: Generates detailed step-by-step CoT explanations
- **Optimized for GRPO Training**: Trained using `trl` and `unsloth` for efficient fine-tuning
---
## **πŸ“Š Dataset & Training Details**
- **Dataset**: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`
- **8K train samples**, **1K test samples**
- Contains **question**, **answer**, and **CoT reasoning**
- **Training Methodology**:
- Used **Guided Reinforcement Policy Optimization (GRPO)** via `trl`
- Applied **gradient accumulation** to manage larger batch sizes
- Integrated **DeepSeek-V3 augmentation** for enhanced logical reasoning
- **Fine-tuning Tools**:
- **Unsloth** for memory-efficient Llama-based tuning
- **Hugging Face Transformers** for model training
For those interested in replicating the fine-tuning process, I have shared an **updated Colab notebook** πŸ““:
πŸ”— [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing)
You will need:
βœ… Hugging Face Token
βœ… Together.AI API Key
βœ… Unsloth Package
---
## **πŸš€ How to Run the Model (Mac via `llama.cpp`)**
Yes! You can run this model **locally on macOS** using `llama.cpp`.
### **1️⃣ Install Homebrew (If Not Installed)**
```sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
Then add Homebrew to your PATH:
```sh
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
```
---
### **2️⃣ Install `llama.cpp`**
```sh
brew install llama.cpp
```
---
### **3️⃣ Run the Model with `llama-cli`**
```sh
llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0
```
---
### **4️⃣ Alternative: Run Locally via GGUF**
```sh
mkdir -p ~/llama_models && cd ~/llama_models
wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf
llama-cli -m ~/llama_models/Q8_0.gguf --interactive
```
---
## **πŸ“Œ How to Use Model via Python (`transformers`)**
You can load the model with **Hugging Face Transformers**:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## **πŸ”¬ Expected Performance**
Compared to the base **DeepSeek-R1-Distill-Qwen-1.5B**, this fine-tuned model:
- Provides **more detailed Chain-of-Thought (CoT) explanations** for GSM8K problems.
- Improves **logical reasoning and step-by-step answer formulation**.
- Generates **clearer, more structured solutions**, making it **ideal for educational use**.
---
## **πŸ—‚ Model Hosting & License**
πŸ“Œ **Model on Hugging Face Hub**:
πŸ‘‰ **[eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)**
πŸ“œ **License**: MIT License – Open for modification and distribution.
---
If you have **feedback or ideas for improvement**, feel free to reach out! πŸš€πŸ”₯
#AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning
```