---
base_model: unsloth/Phi-4-mini-instruct-bnb-4bit
library_name: peft
model_name: n5_label_addition_model
tags:
- base_model:adapter:unsloth/Phi-4-mini-instruct-bnb-4bit
- lora
- sft
- transformers
- trl
- unsloth
licence: license
pipeline_tag: text-generation
license: apache-2.0
datasets:
- UWV/wim-instruct-signaalberichten-to-jsonld-agent-steps
language:
- nl
---


# Phi-4-mini N5 Complaint Categorization Fine-tune

This model is a fine-tuned version of [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) optimized for categorizing citizen complaints into predefined topic and experience labels, trained on the signaalberichten dataset.

## Model Details

### Model Description

- **Developed by:** UWV InnovatieHub
- **Model type:** Causal Language Model with LoRA fine-tuning
- **Language(s):** Dutch (nl)
- **License:** MIT
- **Finetuned from:** microsoft/Phi-4-mini-instruct (3.82B parameters)
- **Training Framework:** Unsloth (optimized training for efficient processing)

### Training Details

- **Dataset:** [UWV/wim_instruct_signaalberichten_to_jsonld_agent_steps](https://huggingface.co/datasets/UWV/wim_instruct_signaalberichten_to_jsonld_agent_steps)
- **Dataset Size:** 4,525 N5-specific examples (label addition tasks)
- **Training Duration:** 1 hour 44 minutes
- **Hardware:** NVIDIA A100 80GB
- **Epochs:** 3.1
- **Steps:** 1,735
- **Training Metrics:**
  - Final Training Loss: 0.7864
  - Final Eval Loss: 0.7796
  - Training samples/second: 2.209
  - Learning rate (final): 6.26e-10

### LoRA Configuration

```python
{
    "r": 512,                    # Large rank for quality
    "lora_alpha": 1024,         # Alpha (2:1 ratio)
    "lora_dropout": 0.1,        # Higher dropout for small dataset
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj"  # Attention layers only
    ]
}
```

### Training Configuration

```python
{
    "model": "phi4-mini",
    "max_seq_length": 4096,
    "batch_size": 8,
    "gradient_accumulation_steps": 1,
    "effective_batch_size": 8,
    "learning_rate": 2e-5,
    "warmup_steps": 50,
    "max_grad_norm": 1.0,
    "lr_scheduler": "cosine",
    "optimizer": "paged_adamw_8bit",
    "bf16": True,
    "seed": 42
}
```

## Intended Uses & Limitations

### Intended Uses

- **Complaint Categorization**: Classify citizen complaints into topic and experience categories
- **Municipal Service Analysis**: Analyze phone transcripts and written complaints
- **Topic Detection**: Identify what the complaint is about (e.g., waste, parking, permits)
- **Experience Analysis**: Determine how citizens experience the service (e.g., communication, speed, clarity)

### Limitations

- Trained on signaalberichten dataset (Dutch municipal complaints)
- Fixed label vocabulary (cannot create new labels)
- Best performance on complaint/service interaction texts
- Limited to 4K token context (sufficient for most complaints)
- Specific to Dutch government/municipal contexts

## How to Use

### Option 1: Using the Merged Model (Recommended)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json

# Load the merged model (ready to use)
model = AutoModelForCausalLM.from_pretrained(
    "UWV/wim-n5-phi4-mini-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-merged")

# Prepare input - complaint text for categorization
complaint_text = """
Burger: Nou, waar ik dus over wil klagen is het afval in de buurt. 
Het is echt niet normaal meer, met al die vuilniszakken die op straat worden gegooid. 
De containers zijn vaak vol en er komen ook ratten. 
Ik had al eens gebeld maar er wordt niks aan gedaan!
"""

messages = [
    {
        "role": "system", 
        "content": "Jij bent een expert in het toewijzen van labels aan een tekst."
    },
    {
        "role": "user", 
        "content": f"""Analyseer de onderstaande tekst en bepaal welke labels van toepassing zijn.

**Onderwerp labels** (selecteer wat van toepassing is):
Vuil/ongedierte overlast, Bruikbaarheid/beschikbaarheid afvalcontainers, 
Parkeeroverlast, Vergunningen, etc.

**Beleving labels** (selecteer wat van toepassing is):
Communicatie, Op de hoogte houden, Statusinformatie, Snelheid van afhandeling, etc.

**Tekst om te analyseren**:
{complaint_text}"""
    }
]

# Apply chat template and generate
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        temperature=0.1,  # Low temperature for consistent labeling
        do_sample=True,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "assistant:" in response:
    response = response.split("assistant:")[-1].strip()

print(response)
```

### Option 2: Using the LoRA Adapter

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4-mini-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load adapter
model = PeftModel.from_pretrained(
    base_model,
    "UWV/wim-n5-phi4-mini-adapter"
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-adapter")

# Use same inference code as above...
```

## Expected Output Format

The model outputs a JSON response with categorization results:

```json
{
    "reasoning": "Omdat de burger klaagt over afval dat op straat wordt gegooid, volle containers en rattenoverlast, zijn de onderwerpen 'Vuil/ongedierte overlast' en 'Bruikbaarheid/beschikbaarheid afvalcontainers' het meest van toepassing. De beleving is negatief: de burger ervaart frustratie over het uitblijven van actie en het gebrek aan terugkoppeling.",
    "onderwerp_labels": [
        "Vuil/ongedierte overlast",
        "Bruikbaarheid/beschikbaarheid afvalcontainers"
    ],
    "beleving_labels": [
        "Op de hoogte houden",
        "Statusinformatie",
        "Communicatie"
    ]
}
```

## Dataset Information

The model was trained on the [UWV/wim-instruct-signaalberichten-to-jsonld-agent-steps](https://huggingface.co/datasets/UWV/wim-instruct-signaalberichten-to-jsonld-agent-steps) dataset, which contains:

- **Source**: Signaalberichten (citizen complaints to municipalities)
- **Domain**: Phone transcripts and written complaints about municipal services
- **N5 Examples**: 4,525 complaint categorization tasks
- **Average Token Length**: 1,636 tokens
- **Max Token Length**: 2,332 tokens
- **Format**: ChatML-formatted instruction-following examples
- **Task**: Categorize complaints into predefined topic and experience labels

**Important**: This is a different task and dataset from the WIM pipeline (N1-N4) which focuses on Wikipedia to JSON-LD conversion.

## Training Results

The model completed 3.1 epochs through the dataset:
- **Final Training Loss**: 0.7864
- **Training Efficiency**: 2.209 samples/second

### Loss Progression
- Started at ~1.13 loss
- Rapid improvement in first epoch
- Stable convergence throughout training
- Final learning rate: 6.26e-10 (cosine decay)
- Gradient norms: Stable around 0.6-0.7

## Model Versions

- **Merged Model**: `UWV/wim-n5-phi4-mini-merged` 
  - Note: Merge failed due to known Phi-4 issue
  - Adapter weights saved instead
  - Model works fine for inference
  
- **LoRA Adapter**: `UWV/wim-n5-phi4-mini-adapter` (~2.29 GB)
  - Requires base Phi-4-mini-instruct model
  - Large adapter due to r=512
  - Includes all training configurations

## Model Context

**Note**: Despite the "n5" naming, this model is NOT part of the WIM (Wikipedia to Knowledge Graph) pipeline that includes N1-N4. This is a separate task focused on complaint categorization.

### WIM Pipeline (Wikipedia to JSON-LD):
1. **N1**: Entity Extraction from Wikipedia text
2. **N2**: Schema.org Type Selection for entities
3. **N3**: Transform to JSON-LD format
4. **N4**: Validation of JSON-LD

### This Model (N5 - Complaint Categorization):
- **Task**: Categorize citizen complaints into topic and experience labels
- **Dataset**: Signaalberichten (municipal complaints)
- **Domain**: Government services and citizen interactions

## Performance Characteristics

- **Sequence Length**: Average 1,636 tokens (moderate length)
- **Batch Processing**: Can handle batch size 8 with 4K context
- **Inference Speed**: Fast label addition to existing JSON-LD
- **Memory Usage**: ~10GB VRAM with 4K context
- **Domain**: Specialized for Dutch government/municipal contexts

## Citation

If you use this model, please cite:

```bibtex
@misc{wim-n5-phi4-mini,
  author = {UWV InnovatieHub},
  title = {Phi-4-mini N5 Complaint Categorization Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/UWV/wim-n5-phi4-mini-merged}
}
```