Qwen2.5-3B-Instruct Fine-Tuned Model

πŸ“Œ Model Overview

This repository contains a fine-tuned version of Qwen2.5-3B-Instruct using Unsloth. The model is optimized for multi-hop reasoning, scientific Q&A, and retrieval-augmented generation (RAG) with FAISS and BM25 retrieval.

  • Base Model: Qwen2.5-3B-Instruct
  • Fine-Tuning Framework: Unsloth
  • Quantization: 4-bit GGUF & 16-bit versions available
  • Training Methods: SFT (Supervised Fine-Tuning) + ORPO (Offline Reward Preference Optimization)

πŸ”₯ Fine-Tuning Details

1️⃣ Datasets Used

  • HotpotQA: Multi-hop reasoning dataset
  • Synthetic QA: Created using extracted document chunks
  • BM25 & FAISS Retrieval: Used to retrieve relevant documents

2️⃣ Training Configuration

  • LoRA Fine-Tuning: PEFT with Unsloth

  • Hyperparameters:

    • r=16, lora_alpha=16, lora_dropout=0
    • gradient_accumulation_steps=4
    • max_seq_length=2048
    • learning_rate=2e-4
    • max_steps=200
    • optimizer=adamw_8bit
  • RL Fine-Tuning (ORPO): Used for improving reasoning performance


πŸ“ Files Included

  • pytorch_model-00001-of-00002.bin - Model weights
  • pytorch_model-00002-of-00002.bin
  • pytorch_model.bin.index.json - Index of model checkpoints
  • config.json - Model configuration
  • tokenizer.json - Tokenizer configuration
  • tokenizer_config.json
  • merges.txt - BPE merge rules
  • vocab.json - Token vocabulary
  • special_tokens_map.json
  • generation_config.json - Default generation settings
  • unsloth.Q4_K_M.gguf - Quantized 4-bit version for Llama-CPP
  • unsloth.F16.gguf - 16-bit version for full precision inference

πŸš€ Model Usage

Load Model in Python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HasinduNimesh/qwen3b-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

input_text = "Why is it necessary to filter out chain-of-thought outputs with mixed languages, long paragraphs, and code blocks?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_length=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Use with Llama-CPP (4-bit GGUF)

from llama_cpp import Llama

llm = Llama(model_path="unsloth.Q4_K_M.gguf", n_ctx=2048)
prompt = "Summarize the latest research on AI safety."
output = llm(prompt, max_tokens=200)
print(output["choices"][0]["text"])

πŸ›  Future Improvements

  • Improve dataset diversity: Add more diverse reasoning datasets
  • Optimize retrieval: Enhance FAISS & BM25 hybrid retrieval
  • Expand RL fine-tuning: Improve reward models for ORPO

πŸ›‘οΈ License

This model is available under the Apache 2.0 License. Please follow Hugging Face’s guidelines for responsible AI usage.


🀝 Acknowledgements

  • Unsloth: For efficient Qwen fine-tuning
  • Hugging Face: Model hosting & dataset tools
  • DeepSeek & Qwen Teams: For providing base models

πŸ“’ For issues or improvements, please open a discussion on Hugging Face! πŸš€

Downloads last month
87
GGUF
Model size
3.09B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for HasinduNimesh/qwen3b-finetuned

Base model

Qwen/Qwen2.5-3B
Quantized
(127)
this model