Qwen2.5-3B-Instruct Fine-Tuned Model
π Model Overview
This repository contains a fine-tuned version of Qwen2.5-3B-Instruct using Unsloth. The model is optimized for multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG) with FAISS and BM25 retrieval.
- Base Model: Qwen2.5-3B-Instruct
- Fine-Tuning Framework: Unsloth
- Quantization: 4-bit GGUF & 16-bit versions available
- Training Methods: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization)
π₯ Fine-Tuning Details
1οΈβ£ Datasets Used
- HotpotQA: Multi-hop reasoning dataset
- Synthetic QA: Created using extracted document chunks
- BM25 & FAISS Retrieval: Used to retrieve relevant documents
2οΈβ£ Training Configuration
LoRA Fine-Tuning: PEFT with Unsloth
Hyperparameters:
r=16, lora_alpha=16, lora_dropout=0
gradient_accumulation_steps=4
max_seq_length=2048
learning_rate=2e-4
max_steps=200
optimizer=adamw_8bit
RL Fine-Tuning (ORPO): Used for improving reasoning performance
π Files Included
pytorch_model-00001-of-00002.bin
- Model weightspytorch_model-00002-of-00002.bin
pytorch_model.bin.index.json
- Index of model checkpointsconfig.json
- Model configurationtokenizer.json
- Tokenizer configurationtokenizer_config.json
merges.txt
- BPE merge rulesvocab.json
- Token vocabularyspecial_tokens_map.json
generation_config.json
- Default generation settingsunsloth.Q4_K_M.gguf
- Quantized 4-bit version for Llama-CPPunsloth.F16.gguf
- 16-bit version for full precision inference
π Model Usage
Load Model in Python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "HasinduNimesh/qwen3b-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_text = "Why is it necessary to filter out chain-of-thought outputs with mixed languages, long paragraphs, and code blocks?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_length=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Use with Llama-CPP (4-bit GGUF)
from llama_cpp import Llama
llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048)
prompt = "Summarize the latest research on AI safety."
output = llm(prompt, max_tokens=200)
print(output["choices"][0]["text"])
π Future Improvements
- Improve dataset diversity: Add more diverse reasoning datasets
- Optimize retrieval: Enhance FAISS & BM25 hybrid retrieval
- Expand RL fine-tuning: Improve reward models for ORPO
π‘οΈ License
This model is available under the Apache 2.0 License. Please follow Hugging Faceβs guidelines for responsible AI usage.
π€ Acknowledgements
- Unsloth: For efficient Qwen fine-tuning
- Hugging Face: Model hosting & dataset tools
- DeepSeek & Qwen Teams: For providing base models
π’ For issues or improvements, please open a discussion on Hugging Face! π
- Downloads last month
- 87
Hardware compatibility
Log In
to view the estimation
4-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.