About this model

  • Developed by: Haq Nawaz Malik
  • License: apache-2.0
  • Finetuned from model : unsloth/phi-4-unsloth-bnb-4bit

Fine-tuned Phi-4 Model Documentation

🔹 Model Overview

Phi-4 is a transformer-based language model optimized for natural language understanding and text generation. We have fine-tuned it using LoRA (Low-Rank Adaptation) with the Unsloth framework, making it lightweight and efficient while preserving the base model's capabilities.

🔹 Training Details

🛠 Fine-tuning Methodology

We employed LoRA (Low-Rank Adaptation) for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power.

📑 Dataset Used

  • Dataset Name: mlabonne/FineTome-100k
  • Dataset Size: 100,000 examples
  • Data Format: Conversational AI dataset with structured prompts and responses.
  • Preprocessing: The dataset was standardized using unsloth.chat_templates.standardize_sharegpt()

🔢 Training Parameters

Parameter Value
LoRA Rank (r) 16
LoRA Alpha 16
LoRA Dropout 0
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length 2048
Load in 4-bit True
Gradient Checkpointing unsloth
Fine-tuning Duration 10 epochs
Optimizer Used AdamW
Learning Rate 2e-5

🔹 How to Load the Model

To load the fine-tuned model, use the Unsloth framework:

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel

model_name = "Omarrran/lora_model"
max_seq_length = 2048
load_in_4bit = True

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit
)

# Apply LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth"
)

NoTE : USE GPU

🔹 Deploying the Model

🚀 Using Google Colab

  1. Install dependencies:
    pip install gradio transformers torch unsloth peft
    
  2. Load the model using the script above.
  3. Run inference using the chatbot interface.

🚀 Deploy on Hugging Face Spaces

  1. Save the script as app.py.
  2. Create a requirements.txt file with:
    gradio
    transformers
    torch
    unsloth
    peft
    
  3. Upload the files to a new Hugging Face Space.
  4. Select Python environment and click Deploy.

🔹 Using the Model

🗨 Chatbot Interface (Gradio UI)

To interact with the fine-tuned model using Gradio, use:

import gradio as gr
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel

# Load the Base Model with Unsloth
model_name = "Omarrran/lora_model"  # Change this if needed
max_seq_length = 2048
load_in_4bit = True  # Use 4-bit quantization to save memory

# Load model and tokenizer
base_model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit
)

# Apply LoRA Adapter
model = FastLanguageModel.get_peft_model(
    base_model,
    r=16,  
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth"
)

# Apply Chat Formatting Template
tokenizer = get_chat_template(tokenizer, chat_template="phi-4")

# Chat Function
def chat_with_model(user_input):
    try:
        inputs = tokenizer(user_input, return_tensors="pt")
        output = model.generate(**inputs, max_length=200)
        response = tokenizer.decode(output[0], skip_special_tokens=True)
        return response
    except Exception as e:
        return f"Error: {str(e)}"

# Define Gradio Interface
description = """
### 🧠 Phi-4 Conversational AI Chatbot
This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**.

#### 🔹 Features:
✅ **Lightweight LoRA adapter for efficiency**  
✅ **Supports long-context conversations (2048 tokens)**  
✅ **Optimized with 4-bit quantization for fast inference**

#### 🔹 Example Questions:
- "What is the capital of France?"
- "Tell me a joke!"
- "Explain black holes in simple terms."
"""

examples = [
    "Hello, how are you?",
    "What is the capital of France?",
    "Tell me a joke!",
    "What is quantum physics?",
    "Translate 'Hello' to French."
]

# Launch Gradio UI
demo = gr.Interface(
    fn=chat_with_model,
    inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."),
    outputs=gr.Textbox(label="Chatbot's Response"),
    title="🔹 HNM_Phi_4_finetuned",
    description=description,
    examples=examples,
    allow_flagging="never"
)

if __name__ == "__main__":
    demo.launch()

📌 Conclusion

This fine-tuned Phi-4 model delivers optimized conversational AI capabilities using LoRA fine-tuning and Unsloth’s 4-bit quantization. The model is lightweight, memory-efficient, and suitable for chatbot applications in both research and production environments.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Omarrran/lora_model

Base model

microsoft/phi-4
Finetuned
(185)
this model