File size: 4,405 Bytes
6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 6e31444 567ab45 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: mit
datasets:
- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
library_name: transformers
tags:
- fine-tuned
- unsloth
- trl
- grpo
- deepseek
- gsm8k
- reasoning
---
# **DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation**
## **Model Overview**
This model is a fine-tuned version of **DeepSeek-R1-Distill-Qwen-1.5B**, trained on the **OpenAI GSM8K dataset**, augmented with **Chain-of-Thought (CoT) reasoning** using **DeepSeek-V3**. The fine-tuning process enhances the modelβs **mathematical problem-solving abilities**, allowing it to provide **step-by-step solutions** with deeper reasoning.
### **πΉ Key Features**
- **Base Model**: DeepSeek-R1-Distill-Qwen-1.5B
- **Fine-Tuned On**: GSM8K dataset with DeepSeek-V3-enhanced reasoning
- **Improved Mathematical Reasoning**: Generates detailed step-by-step CoT explanations
- **Optimized for GRPO Training**: Trained using `trl` and `unsloth` for efficient fine-tuning
---
## **π Dataset & Training Details**
- **Dataset**: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`
- **8K train samples**, **1K test samples**
- Contains **question**, **answer**, and **CoT reasoning**
- **Training Methodology**:
- Used **Guided Reinforcement Policy Optimization (GRPO)** via `trl`
- Applied **gradient accumulation** to manage larger batch sizes
- Integrated **DeepSeek-V3 augmentation** for enhanced logical reasoning
- **Fine-tuning Tools**:
- **Unsloth** for memory-efficient Llama-based tuning
- **Hugging Face Transformers** for model training
For those interested in replicating the fine-tuning process, I have shared an **updated Colab notebook** π:
π [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing)
You will need:
β
Hugging Face Token
β
Together.AI API Key
β
Unsloth Package
---
## **π How to Run the Model (Mac via `llama.cpp`)**
Yes! You can run this model **locally on macOS** using `llama.cpp`.
### **1οΈβ£ Install Homebrew (If Not Installed)**
```sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
Then add Homebrew to your PATH:
```sh
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
```
---
### **2οΈβ£ Install `llama.cpp`**
```sh
brew install llama.cpp
```
---
### **3οΈβ£ Run the Model with `llama-cli`**
```sh
llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0
```
---
### **4οΈβ£ Alternative: Run Locally via GGUF**
```sh
mkdir -p ~/llama_models && cd ~/llama_models
wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf
llama-cli -m ~/llama_models/Q8_0.gguf --interactive
```
---
## **π How to Use Model via Python (`transformers`)**
You can load the model with **Hugging Face Transformers**:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## **π¬ Expected Performance**
Compared to the base **DeepSeek-R1-Distill-Qwen-1.5B**, this fine-tuned model:
- Provides **more detailed Chain-of-Thought (CoT) explanations** for GSM8K problems.
- Improves **logical reasoning and step-by-step answer formulation**.
- Generates **clearer, more structured solutions**, making it **ideal for educational use**.
---
## **π Model Hosting & License**
π **Model on Hugging Face Hub**:
π **[eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)**
π **License**: MIT License β Open for modification and distribution.
---
If you have **feedback or ideas for improvement**, feel free to reach out! ππ₯
#AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning
``` |