HasinduNimesh
/

qwen3b-finetuned

+---
+license: mit
+language:
+- en
+base_model:
+- Qwen/Qwen2.5-3B-Instruct
+---
+# Qwen2.5-3B-Instruct Fine-Tuned Model
+## 📌 Model Overview
+This repository contains a fine-tuned version of **Qwen2.5-3B-Instruct** using Unsloth. The model is optimized for **multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG)** with FAISS and BM25 retrieval.
+- **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
+- **Fine-Tuning Framework**: Unsloth
+- **Quantization**: 4-bit GGUF & 16-bit versions available
+- **Training Methods**: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization)
+---
+## 🔥 Fine-Tuning Details
+### **1️⃣ Datasets Used**
+- **HotpotQA**: Multi-hop reasoning dataset
+- **Synthetic QA**: Created using extracted document chunks
+- **BM25 & FAISS Retrieval**: Used to retrieve relevant documents
+### **2️⃣ Training Configuration**
+- **LoRA Fine-Tuning**: PEFT with Unsloth
+- **Hyperparameters**:
+  - `r=16, lora_alpha=16, lora_dropout=0`
+  - `gradient_accumulation_steps=4`
+  - `max_seq_length=2048`
+  - `learning_rate=2e-4`
+  - `max_steps=200`
+  - `optimizer=adamw_8bit`
+- **RL Fine-Tuning** (ORPO): Used for improving reasoning performance
+---
+## 📁 Files Included
+- `pytorch_model-00001-of-00002.bin` - Model weights
+- `pytorch_model-00002-of-00002.bin`
+- `pytorch_model.bin.index.json` - Index of model checkpoints
+- `config.json` - Model configuration
+- `tokenizer.json` - Tokenizer configuration
+- `tokenizer_config.json`
+- `merges.txt` - BPE merge rules
+- `vocab.json` - Token vocabulary
+- `special_tokens_map.json`
+- `generation_config.json` - Default generation settings
+- `unsloth.Q4_K_M.gguf` - **Quantized 4-bit version** for Llama-CPP
+- `unsloth.F16.gguf` - **16-bit version** for full precision inference
+---
+## 🚀 Model Usage
+### **Load Model in Python**
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "HasinduNimesh/YOUR_REPO_NAME"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
+input_text = "What is the impact of DeepSeek R1 on AI research?"
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+output = model.generate(**inputs, max_length=256)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+### **Use with Llama-CPP (4-bit GGUF)**
+```python
+from llama_cpp import Llama
+llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048)
+prompt = "Summarize the latest research on AI safety."
+output = llm(prompt, max_tokens=200)
+print(output["choices"][0]["text"])
+```
+---
+## 🛠 Future Improvements
+- **Improve dataset diversity**: Add more diverse reasoning datasets
+- **Optimize retrieval**: Enhance FAISS & BM25 hybrid retrieval
+- **Expand RL fine-tuning**: Improve reward models for ORPO
+---
+## 🛡️ License
+This model is available under the **Apache 2.0 License**. Please follow [Hugging Face’s guidelines](https://huggingface.co/docs/hub/models-the-hub) for responsible AI usage.
+---
+## 🤝 Acknowledgements
+- **Unsloth**: For efficient Qwen fine-tuning
+- **Hugging Face**: Model hosting & dataset tools
+- **DeepSeek & Qwen Teams**: For providing base models
+---
+_📢 For issues or improvements, please open a discussion on Hugging Face!_ 🚀