Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- Qwen/Qwen2.5-3B-Instruct
|
7 |
+
---
|
8 |
+
# Qwen2.5-3B-Instruct Fine-Tuned Model
|
9 |
+
|
10 |
+
## 📌 Model Overview
|
11 |
+
This repository contains a fine-tuned version of **Qwen2.5-3B-Instruct** using Unsloth. The model is optimized for **multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG)** with FAISS and BM25 retrieval.
|
12 |
+
|
13 |
+
- **Base Model**: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
|
14 |
+
- **Fine-Tuning Framework**: Unsloth
|
15 |
+
- **Quantization**: 4-bit GGUF & 16-bit versions available
|
16 |
+
- **Training Methods**: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization)
|
17 |
+
|
18 |
+
---
|
19 |
+
## 🔥 Fine-Tuning Details
|
20 |
+
### **1️⃣ Datasets Used**
|
21 |
+
- **HotpotQA**: Multi-hop reasoning dataset
|
22 |
+
- **Synthetic QA**: Created using extracted document chunks
|
23 |
+
- **BM25 & FAISS Retrieval**: Used to retrieve relevant documents
|
24 |
+
|
25 |
+
### **2️⃣ Training Configuration**
|
26 |
+
- **LoRA Fine-Tuning**: PEFT with Unsloth
|
27 |
+
- **Hyperparameters**:
|
28 |
+
- `r=16, lora_alpha=16, lora_dropout=0`
|
29 |
+
- `gradient_accumulation_steps=4`
|
30 |
+
- `max_seq_length=2048`
|
31 |
+
- `learning_rate=2e-4`
|
32 |
+
- `max_steps=200`
|
33 |
+
- `optimizer=adamw_8bit`
|
34 |
+
|
35 |
+
- **RL Fine-Tuning** (ORPO): Used for improving reasoning performance
|
36 |
+
|
37 |
+
---
|
38 |
+
## 📁 Files Included
|
39 |
+
- `pytorch_model-00001-of-00002.bin` - Model weights
|
40 |
+
- `pytorch_model-00002-of-00002.bin`
|
41 |
+
- `pytorch_model.bin.index.json` - Index of model checkpoints
|
42 |
+
- `config.json` - Model configuration
|
43 |
+
- `tokenizer.json` - Tokenizer configuration
|
44 |
+
- `tokenizer_config.json`
|
45 |
+
- `merges.txt` - BPE merge rules
|
46 |
+
- `vocab.json` - Token vocabulary
|
47 |
+
- `special_tokens_map.json`
|
48 |
+
- `generation_config.json` - Default generation settings
|
49 |
+
- `unsloth.Q4_K_M.gguf` - **Quantized 4-bit version** for Llama-CPP
|
50 |
+
- `unsloth.F16.gguf` - **16-bit version** for full precision inference
|
51 |
+
|
52 |
+
---
|
53 |
+
## 🚀 Model Usage
|
54 |
+
### **Load Model in Python**
|
55 |
+
```python
|
56 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
57 |
+
|
58 |
+
model_name = "HasinduNimesh/YOUR_REPO_NAME"
|
59 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
60 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
|
61 |
+
|
62 |
+
input_text = "What is the impact of DeepSeek R1 on AI research?"
|
63 |
+
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
64 |
+
output = model.generate(**inputs, max_length=256)
|
65 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
66 |
+
```
|
67 |
+
|
68 |
+
### **Use with Llama-CPP (4-bit GGUF)**
|
69 |
+
```python
|
70 |
+
from llama_cpp import Llama
|
71 |
+
|
72 |
+
llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048)
|
73 |
+
prompt = "Summarize the latest research on AI safety."
|
74 |
+
output = llm(prompt, max_tokens=200)
|
75 |
+
print(output["choices"][0]["text"])
|
76 |
+
```
|
77 |
+
|
78 |
+
---
|
79 |
+
## 🛠 Future Improvements
|
80 |
+
- **Improve dataset diversity**: Add more diverse reasoning datasets
|
81 |
+
- **Optimize retrieval**: Enhance FAISS & BM25 hybrid retrieval
|
82 |
+
- **Expand RL fine-tuning**: Improve reward models for ORPO
|
83 |
+
|
84 |
+
---
|
85 |
+
## 🛡️ License
|
86 |
+
This model is available under the **Apache 2.0 License**. Please follow [Hugging Face’s guidelines](https://huggingface.co/docs/hub/models-the-hub) for responsible AI usage.
|
87 |
+
|
88 |
+
---
|
89 |
+
## 🤝 Acknowledgements
|
90 |
+
- **Unsloth**: For efficient Qwen fine-tuning
|
91 |
+
- **Hugging Face**: Model hosting & dataset tools
|
92 |
+
- **DeepSeek & Qwen Teams**: For providing base models
|
93 |
+
|
94 |
+
---
|
95 |
+
_📢 For issues or improvements, please open a discussion on Hugging Face!_ 🚀
|