YOLOv12 for Comic Panel Detection
This repository contains a YOLOv12x object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.
This model was trained in PyTorch using the powerful ultralytics
library and demonstrates high performance on a custom-annotated dataset of comic pages.
Visit this space to try out the model right now: The_Best_Comic_Panel_Detection
.
Model Details
- Architecture:
YOLOv12x
(the extra-large variant) - Fine-tuned on: A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
- Classes:
Comic Panel
- Frameworks: PyTorch, Ultralytics
How to Get Started
You can easily use this model with the ultralytics
library. The model file best.pt
from this repository is required.
# 1. Install Ultralytics
!pip install ultralytics
from ultralytics import YOLO
from PIL import Image
# 2. Load the fine-tuned model
# Make sure 'best.pt' is in your current directory
model = YOLO('best.pt')
# 3. Run inference on an image
image_path = 'path/to/your/comic_page.jpg'
results = model.predict(source=image_path)
# 4. Process and visualize results
# The 'results' object contains bounding boxes, classes, and confidence scores
for result in results:
# Plotting will draw the bounding boxes on the image
im_array = result.plot()
im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
im.show() # Display the image
# or
# im.save('prediction_result.jpg')
# You can also access bounding box data directly
for box in results[0].boxes:
print("Class:", model.names[int(box.cls)])
print("Confidence:", box.conf.item())
print("Coordinates (xyxy):", box.xyxy[0].tolist())
print("-" * 20)
Training Procedure
The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.
Training Hyperparameters
- Image Size: 640x640
- Batch Size: 16
- Optimizer: AdamW (lr=0.002)
- Epochs: 200
- Patience: 100 epochs for early stopping
Evaluation
The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest mAP50-95.
Key Performance Metrics
Metric | Value | Description |
---|---|---|
mAP50 | 0.991 | Mean Average Precision at IoU threshold 0.50. |
mAP50-95 | 0.985 | Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95. |
The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.
Qualitative Results
The model correctly identifies panels of various sizes and layouts in the validation set.
Intended Use and Limitations
This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:
- Creating structured digital reading experiences.
- Extracting text or characters from individual panels.
- Analyzing comic book layouts and artistic styles.
The model has been tested in real world applications and has shown promising results.
Limitations
- Non-Rectangular Panels: The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.
Acknowledgements
- Ultralytics for the amazing YOLOv12 model and library.
- Roboflow: for their dataset hosting platform and custom-workflow-3-object-detection-g24r5-fmfkb for compiling and annotating this incredible dataset.
This model card is based on the training notebook YOLOV12-Comic-Panel-Detection
.
- Downloads last month
- 38