|
--- |
|
base_model: unsloth/phi-4-unsloth-bnb-4bit |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- trl |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# About this model |
|
|
|
- **Developed by:** Haq Nawaz Malik |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** unsloth/phi-4-unsloth-bnb-4bit |
|
|
|
# Fine-tuned Phi-4 Model Documentation |
|
|
|
## 🔹 Model Overview |
|
**Phi-4** is a transformer-based language model optimized for **natural language understanding and text generation**. We have fine-tuned it using **LoRA (Low-Rank Adaptation)** with the **Unsloth framework**, making it lightweight and efficient while preserving the base model's capabilities. |
|
|
|
## 🔹 Training Details |
|
### **🛠 Fine-tuning Methodology** |
|
We employed **LoRA (Low-Rank Adaptation)** for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power. |
|
|
|
### **📑 Dataset Used** |
|
- **Dataset Name**: `mlabonne/FineTome-100k` |
|
- **Dataset Size**: 100,000 examples |
|
- **Data Format**: Conversational AI dataset with structured prompts and responses. |
|
- **Preprocessing**: The dataset was standardized using `unsloth.chat_templates.standardize_sharegpt()` |
|
|
|
### **🔢 Training Parameters** |
|
| Parameter | Value | |
|
|----------------------|-------| |
|
| LoRA Rank (`r`) | 16 | |
|
| LoRA Alpha | 16 | |
|
| LoRA Dropout | 0 | |
|
| Target Modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` | |
|
| Max Sequence Length | 2048 | |
|
| Load in 4-bit | True | |
|
| Gradient Checkpointing | `unsloth` | |
|
| Fine-tuning Duration | **10 epochs** | |
|
| Optimizer Used | AdamW | |
|
| Learning Rate | 2e-5 | |
|
|
|
## 🔹 How to Load the Model |
|
To load the fine-tuned model, use the **Unsloth framework**: |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
from unsloth.chat_templates import get_chat_template |
|
from peft import PeftModel |
|
|
|
model_name = "Omarrran/lora_model" |
|
max_seq_length = 2048 |
|
load_in_4bit = True |
|
|
|
# Load model and tokenizer |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name=model_name, |
|
max_seq_length=max_seq_length, |
|
load_in_4bit=load_in_4bit |
|
) |
|
|
|
# Apply LoRA adapter |
|
model = FastLanguageModel.get_peft_model( |
|
model, |
|
r=16, |
|
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", |
|
"gate_proj", "up_proj", "down_proj"], |
|
lora_alpha=16, |
|
lora_dropout=0, |
|
bias="none", |
|
use_gradient_checkpointing="unsloth" |
|
) |
|
``` |
|
## NoTE : USE GPU |
|
## 🔹 Deploying the Model |
|
### **🚀 Using Google Colab** |
|
1. Install dependencies: |
|
```bash |
|
pip install gradio transformers torch unsloth peft |
|
``` |
|
2. Load the model using the script above. |
|
3. Run inference using the chatbot interface. |
|
|
|
### **🚀 Deploy on Hugging Face Spaces** |
|
1. Save the script as `app.py`. |
|
2. Create a `requirements.txt` file with: |
|
``` |
|
gradio |
|
transformers |
|
torch |
|
unsloth |
|
peft |
|
``` |
|
3. Upload the files to a new **Hugging Face Space**. |
|
4. Select **Python environment** and click **Deploy**. |
|
|
|
## 🔹 Using the Model |
|
### **🗨 Chatbot Interface (Gradio UI)** |
|
To interact with the fine-tuned model using **Gradio**, use: |
|
|
|
```python |
|
import gradio as gr |
|
import torch |
|
from unsloth import FastLanguageModel |
|
from unsloth.chat_templates import get_chat_template |
|
from peft import PeftModel |
|
|
|
# Load the Base Model with Unsloth |
|
model_name = "Omarrran/lora_model" # Change this if needed |
|
max_seq_length = 2048 |
|
load_in_4bit = True # Use 4-bit quantization to save memory |
|
|
|
# Load model and tokenizer |
|
base_model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name=model_name, |
|
max_seq_length=max_seq_length, |
|
load_in_4bit=load_in_4bit |
|
) |
|
|
|
# Apply LoRA Adapter |
|
model = FastLanguageModel.get_peft_model( |
|
base_model, |
|
r=16, |
|
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", |
|
"gate_proj", "up_proj", "down_proj"], |
|
lora_alpha=16, |
|
lora_dropout=0, |
|
bias="none", |
|
use_gradient_checkpointing="unsloth" |
|
) |
|
|
|
# Apply Chat Formatting Template |
|
tokenizer = get_chat_template(tokenizer, chat_template="phi-4") |
|
|
|
# Chat Function |
|
def chat_with_model(user_input): |
|
try: |
|
inputs = tokenizer(user_input, return_tensors="pt") |
|
output = model.generate(**inputs, max_length=200) |
|
response = tokenizer.decode(output[0], skip_special_tokens=True) |
|
return response |
|
except Exception as e: |
|
return f"Error: {str(e)}" |
|
|
|
# Define Gradio Interface |
|
description = """ |
|
### 🧠 Phi-4 Conversational AI Chatbot |
|
This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**. |
|
|
|
#### 🔹 Features: |
|
✅ **Lightweight LoRA adapter for efficiency** |
|
✅ **Supports long-context conversations (2048 tokens)** |
|
✅ **Optimized with 4-bit quantization for fast inference** |
|
|
|
#### 🔹 Example Questions: |
|
- "What is the capital of France?" |
|
- "Tell me a joke!" |
|
- "Explain black holes in simple terms." |
|
""" |
|
|
|
examples = [ |
|
"Hello, how are you?", |
|
"What is the capital of France?", |
|
"Tell me a joke!", |
|
"What is quantum physics?", |
|
"Translate 'Hello' to French." |
|
] |
|
|
|
# Launch Gradio UI |
|
demo = gr.Interface( |
|
fn=chat_with_model, |
|
inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."), |
|
outputs=gr.Textbox(label="Chatbot's Response"), |
|
title="🔹 HNM_Phi_4_finetuned", |
|
description=description, |
|
examples=examples, |
|
allow_flagging="never" |
|
) |
|
|
|
if __name__ == "__main__": |
|
demo.launch() |
|
|
|
|
|
``` |
|
|
|
## 📌 Conclusion |
|
This **fine-tuned Phi-4 model** delivers **optimized conversational AI capabilities** using **LoRA fine-tuning and Unsloth’s 4-bit quantization**. The model is **lightweight, memory-efficient**, and suitable for chatbot applications in both **research and production environments**. |
|
|
|
|
|
|
|
|