Phi-4-mini N5 Complaint Categorization Fine-tune

This model is a fine-tuned version of microsoft/Phi-4-mini-instruct optimized for categorizing citizen complaints into predefined topic and experience labels, trained on the signaalberichten dataset.

Model Details

Model Description

Developed by: UWV InnovatieHub
Model type: Causal Language Model with LoRA fine-tuning
Language(s): Dutch (nl)
License: MIT
Finetuned from: microsoft/Phi-4-mini-instruct (3.82B parameters)
Training Framework: Unsloth (optimized training for efficient processing)

Training Details

Dataset: UWV/wim_instruct_signaalberichten_to_jsonld_agent_steps
Dataset Size: 4,525 N5-specific examples (label addition tasks)
Training Duration: 1 hour 44 minutes
Hardware: NVIDIA A100 80GB
Epochs: 3.1
Steps: 1,735
Training Metrics:
- Final Training Loss: 0.7864
- Final Eval Loss: 0.7796
- Training samples/second: 2.209
- Learning rate (final): 6.26e-10

LoRA Configuration

{
    "r": 512,                    # Large rank for quality
    "lora_alpha": 1024,         # Alpha (2:1 ratio)
    "lora_dropout": 0.1,        # Higher dropout for small dataset
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj"  # Attention layers only
    ]
}

Training Configuration

{
    "model": "phi4-mini",
    "max_seq_length": 4096,
    "batch_size": 8,
    "gradient_accumulation_steps": 1,
    "effective_batch_size": 8,
    "learning_rate": 2e-5,
    "warmup_steps": 50,
    "max_grad_norm": 1.0,
    "lr_scheduler": "cosine",
    "optimizer": "paged_adamw_8bit",
    "bf16": True,
    "seed": 42
}

Intended Uses & Limitations

Intended Uses

Complaint Categorization: Classify citizen complaints into topic and experience categories
Municipal Service Analysis: Analyze phone transcripts and written complaints
Topic Detection: Identify what the complaint is about (e.g., waste, parking, permits)
Experience Analysis: Determine how citizens experience the service (e.g., communication, speed, clarity)

Limitations

Trained on signaalberichten dataset (Dutch municipal complaints)
Fixed label vocabulary (cannot create new labels)
Best performance on complaint/service interaction texts
Limited to 4K token context (sufficient for most complaints)
Specific to Dutch government/municipal contexts

How to Use

Option 1: Using the Merged Model (Recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json

# Load the merged model (ready to use)
model = AutoModelForCausalLM.from_pretrained(
    "UWV/wim-n5-phi4-mini-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-merged")

# Prepare input - complaint text for categorization
complaint_text = """
Burger: Nou, waar ik dus over wil klagen is het afval in de buurt. 
Het is echt niet normaal meer, met al die vuilniszakken die op straat worden gegooid. 
De containers zijn vaak vol en er komen ook ratten. 
Ik had al eens gebeld maar er wordt niks aan gedaan!
"""

messages = [
    {
        "role": "system", 
        "content": "Jij bent een expert in het toewijzen van labels aan een tekst."
    },
    {
        "role": "user", 
        "content": f"""Analyseer de onderstaande tekst en bepaal welke labels van toepassing zijn.

**Onderwerp labels** (selecteer wat van toepassing is):
Vuil/ongedierte overlast, Bruikbaarheid/beschikbaarheid afvalcontainers, 
Parkeeroverlast, Vergunningen, etc.

**Beleving labels** (selecteer wat van toepassing is):
Communicatie, Op de hoogte houden, Statusinformatie, Snelheid van afhandeling, etc.

**Tekst om te analyseren**:
{complaint_text}"""
    }
]

# Apply chat template and generate
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        temperature=0.1,  # Low temperature for consistent labeling
        do_sample=True,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "assistant:" in response:
    response = response.split("assistant:")[-1].strip()

print(response)

Option 2: Using the LoRA Adapter

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4-mini-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load adapter
model = PeftModel.from_pretrained(
    base_model,
    "UWV/wim-n5-phi4-mini-adapter"
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-adapter")

# Use same inference code as above...

Expected Output Format

The model outputs a JSON response with categorization results:

{
    "reasoning": "Omdat de burger klaagt over afval dat op straat wordt gegooid, volle containers en rattenoverlast, zijn de onderwerpen 'Vuil/ongedierte overlast' en 'Bruikbaarheid/beschikbaarheid afvalcontainers' het meest van toepassing. De beleving is negatief: de burger ervaart frustratie over het uitblijven van actie en het gebrek aan terugkoppeling.",
    "onderwerp_labels": [
        "Vuil/ongedierte overlast",
        "Bruikbaarheid/beschikbaarheid afvalcontainers"
    ],
    "beleving_labels": [
        "Op de hoogte houden",
        "Statusinformatie",
        "Communicatie"
    ]
}

Dataset Information

The model was trained on the UWV/wim-instruct-signaalberichten-to-jsonld-agent-steps dataset, which contains:

Source: Signaalberichten (citizen complaints to municipalities)
Domain: Phone transcripts and written complaints about municipal services
N5 Examples: 4,525 complaint categorization tasks
Average Token Length: 1,636 tokens
Max Token Length: 2,332 tokens
Format: ChatML-formatted instruction-following examples
Task: Categorize complaints into predefined topic and experience labels

Important: This is a different task and dataset from the WIM pipeline (N1-N4) which focuses on Wikipedia to JSON-LD conversion.

Training Results

The model completed 3.1 epochs through the dataset:

Final Training Loss: 0.7864
Training Efficiency: 2.209 samples/second

Loss Progression

Started at ~1.13 loss
Rapid improvement in first epoch
Stable convergence throughout training
Final learning rate: 6.26e-10 (cosine decay)
Gradient norms: Stable around 0.6-0.7

Model Versions

Merged Model: UWV/wim-n5-phi4-mini-merged
- Note: Merge failed due to known Phi-4 issue
- Adapter weights saved instead
- Model works fine for inference
LoRA Adapter: UWV/wim-n5-phi4-mini-adapter (~2.29 GB)
- Requires base Phi-4-mini-instruct model
- Large adapter due to r=512
- Includes all training configurations

Model Context

Note: Despite the "n5" naming, this model is NOT part of the WIM (Wikipedia to Knowledge Graph) pipeline that includes N1-N4. This is a separate task focused on complaint categorization.

WIM Pipeline (Wikipedia to JSON-LD):

N1: Entity Extraction from Wikipedia text
N2: Schema.org Type Selection for entities
N3: Transform to JSON-LD format
N4: Validation of JSON-LD

This Model (N5 - Complaint Categorization):

Task: Categorize citizen complaints into topic and experience labels
Dataset: Signaalberichten (municipal complaints)
Domain: Government services and citizen interactions

Performance Characteristics

Sequence Length: Average 1,636 tokens (moderate length)
Batch Processing: Can handle batch size 8 with 4K context
Inference Speed: Fast label addition to existing JSON-LD
Memory Usage: ~10GB VRAM with 4K context
Domain: Specialized for Dutch government/municipal contexts

Citation

If you use this model, please cite:

@misc{wim-n5-phi4-mini,
  author = {UWV InnovatieHub},
  title = {Phi-4-mini N5 Complaint Categorization Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/UWV/wim-n5-phi4-mini-merged}
}

UWV
/

wim-n5-phi4-mini-adapter