lora_model / README.md

Update README.md

31a19b0 verified 5 months ago

5.78 kB

	---
	base_model: unsloth/phi-4-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	license: apache-2.0
	language:
	- en
	---

	# About this model

	- Developed by: Haq Nawaz Malik
	- License: apache-2.0
	- Finetuned from model : unsloth/phi-4-unsloth-bnb-4bit

	# Fine-tuned Phi-4 Model Documentation

	## 🔹 Model Overview
	Phi-4 is a transformer-based language model optimized for natural language understanding and text generation. We have fine-tuned it using LoRA (Low-Rank Adaptation) with the Unsloth framework, making it lightweight and efficient while preserving the base model's capabilities.

	## 🔹 Training Details
	### 🛠 Fine-tuning Methodology
	We employed LoRA (Low-Rank Adaptation) for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power.

	### 📑 Dataset Used
	- Dataset Name: `mlabonne/FineTome-100k`
	- Dataset Size: 100,000 examples
	- Data Format: Conversational AI dataset with structured prompts and responses.
	- Preprocessing: The dataset was standardized using `unsloth.chat_templates.standardize_sharegpt()`

	### 🔢 Training Parameters
	\| Parameter \| Value \|
	\|----------------------\|-------\|
	\| LoRA Rank (`r`) \| 16 \|
	\| LoRA Alpha \| 16 \|
	\| LoRA Dropout \| 0 \|
	\| Target Modules \| `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` \|
	\| Max Sequence Length \| 2048 \|
	\| Load in 4-bit \| True \|
	\| Gradient Checkpointing \| `unsloth` \|
	\| Fine-tuning Duration \| 10 epochs \|
	\| Optimizer Used \| AdamW \|
	\| Learning Rate \| 2e-5 \|

	## 🔹 How to Load the Model
	To load the fine-tuned model, use the Unsloth framework:

	```python
	from unsloth import FastLanguageModel
	from unsloth.chat_templates import get_chat_template
	from peft import PeftModel

	model_name = "Omarrran/lora_model"
	max_seq_length = 2048
	load_in_4bit = True

	# Load model and tokenizer
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name=model_name,
	max_seq_length=max_seq_length,
	load_in_4bit=load_in_4bit
	)

	# Apply LoRA adapter
	model = FastLanguageModel.get_peft_model(
	model,
	r=16,
	target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
	"gate_proj", "up_proj", "down_proj"],
	lora_alpha=16,
	lora_dropout=0,
	bias="none",
	use_gradient_checkpointing="unsloth"
	)
	```
	## NoTE : USE GPU
	## 🔹 Deploying the Model
	### 🚀 Using Google Colab
	1. Install dependencies:
	```bash
	pip install gradio transformers torch unsloth peft
	```
	2. Load the model using the script above.
	3. Run inference using the chatbot interface.

	### 🚀 Deploy on Hugging Face Spaces
	1. Save the script as `app.py`.
	2. Create a `requirements.txt` file with:
	```
	gradio
	transformers
	torch
	unsloth
	peft
	```
	3. Upload the files to a new Hugging Face Space.
	4. Select Python environment and click Deploy.

	## 🔹 Using the Model
	### 🗨 Chatbot Interface (Gradio UI)
	To interact with the fine-tuned model using Gradio, use:

	```python
	import gradio as gr
	import torch
	from unsloth import FastLanguageModel
	from unsloth.chat_templates import get_chat_template
	from peft import PeftModel

	# Load the Base Model with Unsloth
	model_name = "Omarrran/lora_model" # Change this if needed
	max_seq_length = 2048
	load_in_4bit = True # Use 4-bit quantization to save memory

	# Load model and tokenizer
	base_model, tokenizer = FastLanguageModel.from_pretrained(
	model_name=model_name,
	max_seq_length=max_seq_length,
	load_in_4bit=load_in_4bit
	)

	# Apply LoRA Adapter
	model = FastLanguageModel.get_peft_model(
	base_model,
	r=16,
	target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
	"gate_proj", "up_proj", "down_proj"],
	lora_alpha=16,
	lora_dropout=0,
	bias="none",
	use_gradient_checkpointing="unsloth"
	)

	# Apply Chat Formatting Template
	tokenizer = get_chat_template(tokenizer, chat_template="phi-4")

	# Chat Function
	def chat_with_model(user_input):
	try:
	inputs = tokenizer(user_input, return_tensors="pt")
	output = model.generate(**inputs, max_length=200)
	response = tokenizer.decode(output[0], skip_special_tokens=True)
	return response
	except Exception as e:
	return f"Error: {str(e)}"

	# Define Gradio Interface
	description = """
	### 🧠 Phi-4 Conversational AI Chatbot
	This chatbot is powered by Unsloth's Phi-4 model, optimized with LoRA fine-tuning.

	#### 🔹 Features:
	✅ Lightweight LoRA adapter for efficiency
	✅ Supports long-context conversations (2048 tokens)
	✅ Optimized with 4-bit quantization for fast inference

	#### 🔹 Example Questions:
	- "What is the capital of France?"
	- "Tell me a joke!"
	- "Explain black holes in simple terms."
	"""

	examples = [
	"Hello, how are you?",
	"What is the capital of France?",
	"Tell me a joke!",
	"What is quantum physics?",
	"Translate 'Hello' to French."
	]

	# Launch Gradio UI
	demo = gr.Interface(
	fn=chat_with_model,
	inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."),
	outputs=gr.Textbox(label="Chatbot's Response"),
	title="🔹 HNM_Phi_4_finetuned",
	description=description,
	examples=examples,
	allow_flagging="never"
	)

	if __name__ == "__main__":
	demo.launch()


	```

	## 📌 Conclusion
	This fine-tuned Phi-4 model delivers optimized conversational AI capabilities using LoRA fine-tuning and Unsloth’s 4-bit quantization. The model is lightweight, memory-efficient, and suitable for chatbot applications in both research and production environments.