--- license: mit datasets: - eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1 language: - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B library_name: transformers tags: - fine-tuned - unsloth - trl - grpo - deepseek - gsm8k - reasoning --- # **DeepSeek-R1-Distill-Qwen-1.5B Fine-Tuned on GSM8K with Chain-of-Thought Augmentation** ## **Model Overview** This model is a fine-tuned version of **DeepSeek-R1-Distill-Qwen-1.5B**, trained on the **OpenAI GSM8K dataset**, augmented with **Chain-of-Thought (CoT) reasoning** using **DeepSeek-V3**. The fine-tuning process enhances the model’s **mathematical problem-solving abilities**, allowing it to provide **step-by-step solutions** with deeper reasoning. ### **πŸ”Ή Key Features** - **Base Model**: DeepSeek-R1-Distill-Qwen-1.5B - **Fine-Tuned On**: GSM8K dataset with DeepSeek-V3-enhanced reasoning - **Improved Mathematical Reasoning**: Generates detailed step-by-step CoT explanations - **Optimized for GRPO Training**: Trained using `trl` and `unsloth` for efficient fine-tuning --- ## **πŸ“Š Dataset & Training Details** - **Dataset**: `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1` - **8K train samples**, **1K test samples** - Contains **question**, **answer**, and **CoT reasoning** - **Training Methodology**: - Used **Guided Reinforcement Policy Optimization (GRPO)** via `trl` - Applied **gradient accumulation** to manage larger batch sizes - Integrated **DeepSeek-V3 augmentation** for enhanced logical reasoning - **Fine-tuning Tools**: - **Unsloth** for memory-efficient Llama-based tuning - **Hugging Face Transformers** for model training For those interested in replicating the fine-tuning process, I have shared an **updated Colab notebook** πŸ““: πŸ”— [Colab Notebook](https://colab.research.google.com/drive/1HV0YkyiTD55j1xLRBHwJ_q3ex82W5EXr?usp=sharing) You will need: βœ… Hugging Face Token βœ… Together.AI API Key βœ… Unsloth Package --- ## **πŸš€ How to Run the Model (Mac via `llama.cpp`)** Yes! You can run this model **locally on macOS** using `llama.cpp`. ### **1️⃣ Install Homebrew (If Not Installed)** ```sh /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" ``` Then add Homebrew to your PATH: ```sh echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile eval "$(/opt/homebrew/bin/brew shellenv)" ``` --- ### **2️⃣ Install `llama.cpp`** ```sh brew install llama.cpp ``` --- ### **3️⃣ Run the Model with `llama-cli`** ```sh llama-cli -hf eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf:Q8_0 ``` --- ### **4️⃣ Alternative: Run Locally via GGUF** ```sh mkdir -p ~/llama_models && cd ~/llama_models wget https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced-gguf/resolve/main/Q8_0.gguf llama-cli -m ~/llama_models/Q8_0.gguf --interactive ``` --- ## **πŸ“Œ How to Use Model via Python (`transformers`)** You can load the model with **Hugging Face Transformers**: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "A farmer has 24 apples. He gives 6 to each of his 3 children. How many does he have left?" inputs = tokenizer(prompt, return_tensors="pt") output = model.generate(**inputs, max_length=200) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- ## **πŸ”¬ Expected Performance** Compared to the base **DeepSeek-R1-Distill-Qwen-1.5B**, this fine-tuned model: - Provides **more detailed Chain-of-Thought (CoT) explanations** for GSM8K problems. - Improves **logical reasoning and step-by-step answer formulation**. - Generates **clearer, more structured solutions**, making it **ideal for educational use**. --- ## **πŸ—‚ Model Hosting & License** πŸ“Œ **Model on Hugging Face Hub**: πŸ‘‰ **[eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced](https://huggingface.co/eagle0504/deepseek-r1-qwen-1.5b-gsm8k-enhanced)** πŸ“œ **License**: MIT License – Open for modification and distribution. --- If you have **feedback or ideas for improvement**, feel free to reach out! πŸš€πŸ”₯ #AI #MachineLearning #DeepSeek #GSM8K #LLM #ChainOfThought #HuggingFace #GRPO #Reasoning ```