|
--- |
|
license: mit |
|
datasets: |
|
- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1 |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
|
library_name: transformers |
|
tags: |
|
- fine-tuned |
|
- unsloth |
|
- trl |
|
- grpo |
|
- deepseek |
|
- gsm8k |
|
- reasoning |
|
--- |
|
|
|
# **DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation** |
|
|
|
## **Model Overview** |
|
This model is a fine-tuned version of **DeepSeek-R1-Distill-Qwen-1.5B**, trained on the **OpenAI GSM8K dataset**, augmented with **Chain-of-Thought (CoT) reasoning** using **DeepSeek-V3**. The fine-tuning process enhances the modelβs **mathematical problem-solving abilities**, allowing it to provide **step-by-step solutions** with deeper reasoning. |
|
|
|
### **πΉ Key Features** |
|
- **Base Model**: DeepSeek-R1-Distill-Qwen-1.5B |
|
- **Fine-Tuned On**: GSM8K dataset with DeepSeek-V3-enhanced reasoning |
|
- **Improved Mathematical Reasoning**: Generates detailed step-by-step CoT explanations |
|
- **Optimized for GRPO Training**: Trained using `trl` and `unsloth` for efficient fine-tuning |
|
|
|
--- |
|
|
|
## **π Dataset & Training Details** |
|
- **Dataset**: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1` |
|
- **8K train samples**, **1K test samples** |
|
- Contains **question**, **answer**, and **CoT reasoning** |
|
- **Training Methodology**: |
|
- Used **Guided Reinforcement Policy Optimization (GRPO)** via `trl` |
|
- Applied **gradient accumulation** to manage larger batch sizes |
|
- Integrated **DeepSeek-V3 augmentation** for enhanced logical reasoning |
|
- **Fine-tuning Tools**: |
|
- **Unsloth** for memory-efficient Llama-based tuning |
|
- **Hugging Face Transformers** for model training |
|
|
|
For those interested in replicating the fine-tuning process, I have shared an **updated Colab notebook** π: |
|
π [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing) |
|
|
|
You will need: |
|
β
Hugging Face Token |
|
β
Together.AI API Key |
|
β
Unsloth Package |
|
|
|
--- |
|
|
|
## **π How to Run the Model (Mac via `llama.cpp`)** |
|
Yes! You can run this model **locally on macOS** using `llama.cpp`. |
|
|
|
### **1οΈβ£ Install Homebrew (If Not Installed)** |
|
```sh |
|
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" |
|
``` |
|
Then add Homebrew to your PATH: |
|
```sh |
|
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile |
|
eval "$(/opt/homebrew/bin/brew shellenv)" |
|
``` |
|
|
|
--- |
|
|
|
### **2οΈβ£ Install `llama.cpp`** |
|
```sh |
|
brew install llama.cpp |
|
``` |
|
|
|
--- |
|
|
|
### **3οΈβ£ Run the Model with `llama-cli`** |
|
```sh |
|
llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0 |
|
``` |
|
|
|
--- |
|
|
|
### **4οΈβ£ Alternative: Run Locally via GGUF** |
|
```sh |
|
mkdir -p ~/llama_models && cd ~/llama_models |
|
wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf |
|
llama-cli -m ~/llama_models/Q8_0.gguf --interactive |
|
``` |
|
|
|
--- |
|
|
|
## **π How to Use Model via Python (`transformers`)** |
|
You can load the model with **Hugging Face Transformers**: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
output = model.generate(**inputs, max_length=200) |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
## **π¬ Expected Performance** |
|
Compared to the base **DeepSeek-R1-Distill-Qwen-1.5B**, this fine-tuned model: |
|
- Provides **more detailed Chain-of-Thought (CoT) explanations** for GSM8K problems. |
|
- Improves **logical reasoning and step-by-step answer formulation**. |
|
- Generates **clearer, more structured solutions**, making it **ideal for educational use**. |
|
|
|
--- |
|
|
|
## **π Model Hosting & License** |
|
π **Model on Hugging Face Hub**: |
|
π **[eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)** |
|
|
|
π **License**: MIT License β Open for modification and distribution. |
|
|
|
--- |
|
|
|
If you have **feedback or ideas for improvement**, feel free to reach out! ππ₯ |
|
|
|
#AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning |
|
``` |