Update README.md

567ab45 verified 3 months ago

4.41 kB

	---
	license: mit
	datasets:
	- eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	library_name: transformers
	tags:
	- fine-tuned
	- unsloth
	- trl
	- grpo
	- deepseek
	- gsm8k
	- reasoning
	---

	# DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation

	## Model Overview
	This model is a fine-tuned version of DeepSeek-R1-Distill-Qwen-1.5B, trained on the OpenAI GSM8K dataset, augmented with Chain-of-Thought (CoT) reasoning using DeepSeek-V3. The fine-tuning process enhances the model’s mathematical problem-solving abilities, allowing it to provide step-by-step solutions with deeper reasoning.

	### 🔹 Key Features
	- Base Model: DeepSeek-R1-Distill-Qwen-1.5B
	- Fine-Tuned On: GSM8K dataset with DeepSeek-V3-enhanced reasoning
	- Improved Mathematical Reasoning: Generates detailed step-by-step CoT explanations
	- Optimized for GRPO Training: Trained using `trl` and `unsloth` for efficient fine-tuning

	---

	## 📊 Dataset & Training Details
	- Dataset: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`
	- 8K train samples, 1K test samples
	- Contains question, answer, and CoT reasoning
	- Training Methodology:
	- Used Guided Reinforcement Policy Optimization (GRPO) via `trl`
	- Applied gradient accumulation to manage larger batch sizes
	- Integrated DeepSeek-V3 augmentation for enhanced logical reasoning
	- Fine-tuning Tools:
	- Unsloth for memory-efficient Llama-based tuning
	- Hugging Face Transformers for model training

	For those interested in replicating the fine-tuning process, I have shared an updated Colab notebook 📓:
	🔗 [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing)

	You will need:
	✅ Hugging Face Token
	✅ Together.AI API Key
	✅ Unsloth Package

	---

	## 🚀 How to Run the Model (Mac via `llama.cpp`)
	Yes! You can run this model locally on macOS using `llama.cpp`.

	### 1️⃣ Install Homebrew (If Not Installed)
	```sh
	/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
	```
	Then add Homebrew to your PATH:
	```sh
	echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
	eval "$(/opt/homebrew/bin/brew shellenv)"
	```

	---

	### 2️⃣ Install `llama.cpp`
	```sh
	brew install llama.cpp
	```

	---

	### 3️⃣ Run the Model with `llama-cli`
	```sh
	llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0
	```

	---

	### 4️⃣ Alternative: Run Locally via GGUF
	```sh
	mkdir -p ~/llama_models && cd ~/llama_models
	wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf
	llama-cli -m ~/llama_models/Q8_0.gguf --interactive
	```

	---

	## 📌 How to Use Model via Python (`transformers`)
	You can load the model with Hugging Face Transformers:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?"
	inputs = tokenizer(prompt, return_tensors="pt")
	output = model.generate(**inputs, max_length=200)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	---

	## 🔬 Expected Performance
	Compared to the base DeepSeek-R1-Distill-Qwen-1.5B, this fine-tuned model:
	- Provides more detailed Chain-of-Thought (CoT) explanations for GSM8K problems.
	- Improves logical reasoning and step-by-step answer formulation.
	- Generates clearer, more structured solutions, making it ideal for educational use.

	---

	## 🗂 Model Hosting & License
	📌 Model on Hugging Face Hub:
	👉 [eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)

	📜 License: MIT License – Open for modification and distribution.

	---

	If you have feedback or ideas for improvement, feel free to reach out! 🚀🔥

	#AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning
	```