Model Card for iconclass-vlm
This model is a fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct on the davanstrien/iconclass-vlm-sft dataset.
You can explore the predictions of this model using this Space.
Model Description
This vision-language model has been fine-tuned to generate Iconclass classification codes from images. Iconclass is a comprehensive classification system for describing the content of images, particularly used in cultural heritage and art history contexts.
The model was trained using Supervised Fine-Tuning (SFT) with TRL on a reformatted version of the Brill Iconclass AI Test Set, which contains 87,744 images with expert-assigned Iconclass labels.
Intended Use
- Primary use case: Automatic classification of art and cultural heritage images using Iconclass notation
- Users: Digital humanities researchers, museum professionals, art historians, and developers working with cultural heritage collections
Quick Start
Simple Pipeline Approach
from transformers import pipeline
from PIL import Image
# Load pipeline
pipe = pipeline("image-text-to-text", model="davanstrien/iconclass-vlm")
# Load your image
image = Image.open("your_artwork.jpg")
# Prepare messages
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Generate Iconclass labels for this image"}
]
}
]
# Generate with beam search for better results
output = pipe(messages, max_new_tokens=800, num_beams=4)
print(output[0]["generated_text"])
Alternative Approach with AutoModel
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
model_name = "davanstrien/iconclass-vlm"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(model_name)
# Load your image
image = Image.open("your_artwork.jpg")
# Prepare inputs
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Generate Iconclass labels for this image"}
]
}
]
# Process and generate
inputs = processor(messages, images=[image], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=800, num_beams=4)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Dataset
The model was trained on a reformatted version of the Brill Iconclass AI Test Set biglam/brill_iconclass.
The dataset was reformatted into a messages format suitable for SFT training.
Training Procedure
This model was trained with SFT (Supervised Fine-Tuning).
Framework Versions
TRL: 0.22.1
Transformers: 4.55.2
PyTorch: 2.8.0
Datasets: 4.0.0
Tokenizers: 0.21.4
Limitations and Biases
The Iconclass classification system reflects biases from its creation period (1940s Netherlands) Certain categories, particularly those related to human classification, may contain outdated or problematic terminology Model performance may vary on images outside the Western art tradition due to dataset composition
Citations
Model and Training
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Dataset
@misc{iconclass,
title = {Brill Iconclass AI Test Set},
author = {Etienne Posthumus},
year = {2020}
}
- Downloads last month
- 543
Model tree for davanstrien/iconclass-vlm
Base model
Qwen/Qwen2.5-VL-3B-Instruct