--- license: mit language: - en base_model: - Qwen/Qwen2.5-3B-Instruct --- # Qwen2.5-3B-Instruct Fine-Tuned Model ## 📌 Model Overview This repository contains a fine-tuned version of **Qwen2.5-3B-Instruct** using Unsloth. The model is optimized for **multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG)** with FAISS and BM25 retrieval. - **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) - **Fine-Tuning Framework**: Unsloth - **Quantization**: 4-bit GGUF & 16-bit versions available - **Training Methods**: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization) --- ## 🔥 Fine-Tuning Details ### **1️⃣ Datasets Used** - **HotpotQA**: Multi-hop reasoning dataset - **Synthetic QA**: Created using extracted document chunks - **BM25 & FAISS Retrieval**: Used to retrieve relevant documents ### **2️⃣ Training Configuration** - **LoRA Fine-Tuning**: PEFT with Unsloth - **Hyperparameters**: - `r=16, lora_alpha=16, lora_dropout=0` - `gradient_accumulation_steps=4` - `max_seq_length=2048` - `learning_rate=2e-4` - `max_steps=200` - `optimizer=adamw_8bit` - **RL Fine-Tuning** (ORPO): Used for improving reasoning performance --- ## 📁 Files Included - `pytorch_model-00001-of-00002.bin` - Model weights - `pytorch_model-00002-of-00002.bin` - `pytorch_model.bin.index.json` - Index of model checkpoints - `config.json` - Model configuration - `tokenizer.json` - Tokenizer configuration - `tokenizer_config.json` - `merges.txt` - BPE merge rules - `vocab.json` - Token vocabulary - `special_tokens_map.json` - `generation_config.json` - Default generation settings - `unsloth.Q4_K_M.gguf` - **Quantized 4-bit version** for Llama-CPP - `unsloth.F16.gguf` - **16-bit version** for full precision inference --- ## 🚀 Model Usage ### **Load Model in Python** ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "HasinduNimesh/qwen3b-finetuned" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") input_text = "Why is it necessary to filter out chain-of-thought outputs with mixed languages, long paragraphs, and code blocks?" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_length=256) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ### **Use with Llama-CPP (4-bit GGUF)** ```python from llama_cpp import Llama llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048) prompt = "Summarize the latest research on AI safety." output = llm(prompt, max_tokens=200) print(output["choices"][0]["text"]) ``` --- ## 🛠 Future Improvements - **Improve dataset diversity**: Add more diverse reasoning datasets - **Optimize retrieval**: Enhance FAISS & BM25 hybrid retrieval - **Expand RL fine-tuning**: Improve reward models for ORPO --- ## 🛡️ License This model is available under the **Apache 2.0 License**. Please follow [Hugging Face’s guidelines](https://huggingface.co/docs/hub/models-the-hub) for responsible AI usage. --- ## 🤝 Acknowledgements - **Unsloth**: For efficient Qwen fine-tuning - **Hugging Face**: Model hosting & dataset tools - **DeepSeek & Qwen Teams**: For providing base models --- _📢 For issues or improvements, please open a discussion on Hugging Face!_ 🚀