HasinduNimesh
/

qwen3b-finetuned

Model card Files Files and versions Community

qwen3b-finetuned / README.md

HasinduNimesh's picture

Update README.md

4aa93f8 verified 4 months ago

|

history blame contribute delete

3.4 kB

	---
	license: mit
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-3B-Instruct
	---
	# Qwen2.5-3B-Instruct Fine-Tuned Model

	## 📌 Model Overview
	This repository contains a fine-tuned version of Qwen2.5-3B-Instruct using Unsloth. The model is optimized for multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG) with FAISS and BM25 retrieval.

	- Base Model: [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
	- Fine-Tuning Framework: Unsloth
	- Quantization: 4-bit GGUF & 16-bit versions available
	- Training Methods: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization)

	---
	## 🔥 Fine-Tuning Details
	### 1️⃣ Datasets Used
	- HotpotQA: Multi-hop reasoning dataset
	- Synthetic QA: Created using extracted document chunks
	- BM25 & FAISS Retrieval: Used to retrieve relevant documents

	### 2️⃣ Training Configuration
	- LoRA Fine-Tuning: PEFT with Unsloth
	- Hyperparameters:
	- `r=16, lora_alpha=16, lora_dropout=0`
	- `gradient_accumulation_steps=4`
	- `max_seq_length=2048`
	- `learning_rate=2e-4`
	- `max_steps=200`
	- `optimizer=adamw_8bit`

	- RL Fine-Tuning (ORPO): Used for improving reasoning performance

	---
	## 📁 Files Included
	- `pytorch_model-00001-of-00002.bin` - Model weights
	- `pytorch_model-00002-of-00002.bin`
	- `pytorch_model.bin.index.json` - Index of model checkpoints
	- `config.json` - Model configuration
	- `tokenizer.json` - Tokenizer configuration
	- `tokenizer_config.json`
	- `merges.txt` - BPE merge rules
	- `vocab.json` - Token vocabulary
	- `special_tokens_map.json`
	- `generation_config.json` - Default generation settings
	- `unsloth.Q4_K_M.gguf` - Quantized 4-bit version for Llama-CPP
	- `unsloth.F16.gguf` - 16-bit version for full precision inference

	---
	## 🚀 Model Usage
	### Load Model in Python
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "HasinduNimesh/qwen3b-finetuned"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	input_text = "Why is it necessary to filter out chain-of-thought outputs with mixed languages, long paragraphs, and code blocks?"
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
	output = model.generate(**inputs, max_length=256)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	### Use with Llama-CPP (4-bit GGUF)
	```python
	from llama_cpp import Llama

	llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048)
	prompt = "Summarize the latest research on AI safety."
	output = llm(prompt, max_tokens=200)
	print(output["choices"][0]["text"])
	```

	---
	## 🛠 Future Improvements
	- Improve dataset diversity: Add more diverse reasoning datasets
	- Optimize retrieval: Enhance FAISS & BM25 hybrid retrieval
	- Expand RL fine-tuning: Improve reward models for ORPO

	---
	## 🛡️ License
	This model is available under the Apache 2.0 License. Please follow [Hugging Face’s guidelines](https://huggingface.co/docs/hub/models-the-hub) for responsible AI usage.

	---
	## 🤝 Acknowledgements
	- Unsloth: For efficient Qwen fine-tuning
	- Hugging Face: Model hosting & dataset tools
	- DeepSeek & Qwen Teams: For providing base models

	---
	_📢 For issues or improvements, please open a discussion on Hugging Face!_ 🚀