---
license: mit
language:
- en
base_model:
- Qwen/Qwen2.5-3B-Instruct
---
# Qwen2.5-3B-Instruct Fine-Tuned Model

## 📌 Model Overview
This repository contains a fine-tuned version of **Qwen2.5-3B-Instruct** using Unsloth. The model is optimized for **multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG)** with FAISS and BM25 retrieval.

- **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
- **Fine-Tuning Framework**: Unsloth
- **Quantization**: 4-bit GGUF & 16-bit versions available
- **Training Methods**: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization)

---
## 🔥 Fine-Tuning Details
### **1️⃣ Datasets Used**
- **HotpotQA**: Multi-hop reasoning dataset
- **Synthetic QA**: Created using extracted document chunks
- **BM25 & FAISS Retrieval**: Used to retrieve relevant documents

### **2️⃣ Training Configuration**
- **LoRA Fine-Tuning**: PEFT with Unsloth
- **Hyperparameters**:
  - `r=16, lora_alpha=16, lora_dropout=0`
  - `gradient_accumulation_steps=4`
  - `max_seq_length=2048`
  - `learning_rate=2e-4`
  - `max_steps=200`
  - `optimizer=adamw_8bit`
  
- **RL Fine-Tuning** (ORPO): Used for improving reasoning performance

---
## 📁 Files Included
- `pytorch_model-00001-of-00002.bin` - Model weights
- `pytorch_model-00002-of-00002.bin`
- `pytorch_model.bin.index.json` - Index of model checkpoints
- `config.json` - Model configuration
- `tokenizer.json` - Tokenizer configuration
- `tokenizer_config.json`
- `merges.txt` - BPE merge rules
- `vocab.json` - Token vocabulary
- `special_tokens_map.json`
- `generation_config.json` - Default generation settings
- `unsloth.Q4_K_M.gguf` - **Quantized 4-bit version** for Llama-CPP
- `unsloth.F16.gguf` - **16-bit version** for full precision inference

---
## 🚀 Model Usage
### **Load Model in Python**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HasinduNimesh/qwen3b-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

input_text = "Why is it necessary to filter out chain-of-thought outputs with mixed languages, long paragraphs, and code blocks?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_length=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

### **Use with Llama-CPP (4-bit GGUF)**
```python
from llama_cpp import Llama

llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048)
prompt = "Summarize the latest research on AI safety."
output = llm(prompt, max_tokens=200)
print(output["choices"][0]["text"])
```

---
## 🛠 Future Improvements
- **Improve dataset diversity**: Add more diverse reasoning datasets
- **Optimize retrieval**: Enhance FAISS & BM25 hybrid retrieval
- **Expand RL fine-tuning**: Improve reward models for ORPO

---
## 🛡️ License
This model is available under the **Apache 2.0 License**. Please follow [Hugging Face’s guidelines](https://huggingface.co/docs/hub/models-the-hub) for responsible AI usage.

---
## 🤝 Acknowledgements
- **Unsloth**: For efficient Qwen fine-tuning
- **Hugging Face**: Model hosting & dataset tools
- **DeepSeek & Qwen Teams**: For providing base models

---
_📢 For issues or improvements, please open a discussion on Hugging Face!_ 🚀