Model Card for iconclass-vlm

This model is a fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct on the davanstrien/iconclass-vlm-sft dataset.

You can explore the predictions of this model using this Space.

Model Description

This vision-language model has been fine-tuned to generate Iconclass classification codes from images. Iconclass is a comprehensive classification system for describing the content of images, particularly used in cultural heritage and art history contexts.

The model was trained using Supervised Fine-Tuning (SFT) with TRL on a reformatted version of the Brill Iconclass AI Test Set, which contains 87,744 images with expert-assigned Iconclass labels.

Intended Use

Primary use case: Automatic classification of art and cultural heritage images using Iconclass notation
Users: Digital humanities researchers, museum professionals, art historians, and developers working with cultural heritage collections

Quick Start

Simple Pipeline Approach

from transformers import pipeline
from PIL import Image

# Load pipeline
pipe = pipeline("image-text-to-text", model="davanstrien/iconclass-vlm")

# Load your image
image = Image.open("your_artwork.jpg")

# Prepare messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Generate Iconclass labels for this image"}
        ]
    }
]

# Generate with beam search for better results
output = pipe(messages, max_new_tokens=800, num_beams=4)
print(output[0]["generated_text"])

Alternative Approach with AutoModel

from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image

model_name = "davanstrien/iconclass-vlm"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(model_name)

# Load your image
image = Image.open("your_artwork.jpg")

# Prepare inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Generate Iconclass labels for this image"}
        ]
    }
]

# Process and generate
inputs = processor(messages, images=[image], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=800, num_beams=4)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Dataset

The model was trained on a reformatted version of the Brill Iconclass AI Test Set biglam/brill_iconclass.

The dataset was reformatted into a messages format suitable for SFT training. Training Procedure

This model was trained with SFT (Supervised Fine-Tuning).

Framework Versions

TRL: 0.22.1
Transformers: 4.55.2
PyTorch: 2.8.0
Datasets: 4.0.0
Tokenizers: 0.21.4

Limitations and Biases

The Iconclass classification system reflects biases from its creation period (1940s Netherlands) Certain categories, particularly those related to human classification, may contain outdated or problematic terminology Model performance may vary on images outside the Western art tradition due to dataset composition

Citations

Model and Training

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Dataset

@misc{iconclass,
    title = {Brill Iconclass AI Test Set},
    author = {Etienne Posthumus},
    year = {2020}
}

davanstrien
/

iconclass-vlm