YOLOv12 for Comic Panel Detection

This repository contains a YOLOv12x object detection model fine-tuned to detect individual panels in comic book pages. The model identifies the bounding boxes for each panel, making it a valuable tool for digitizing comics, extracting content, or building datasets for downstream analysis.

This model was trained in PyTorch using the powerful ultralytics library and demonstrates high performance on a custom-annotated dataset of comic pages.

Visit this space to try out the model right now: The_Best_Comic_Panel_Detection.

Model Details

  • Architecture: YOLOv12x (the extra-large variant)
  • Fine-tuned on: A custom Roboflow dataset named "Custom-Workflow-3-Object-Detection-1".
  • Classes: Comic Panel
  • Frameworks: PyTorch, Ultralytics

How to Get Started

You can easily use this model with the ultralytics library. The model file best.pt from this repository is required.

# 1. Install Ultralytics
!pip install ultralytics

from ultralytics import YOLO
from PIL import Image

# 2. Load the fine-tuned model
# Make sure 'best.pt' is in your current directory
model = YOLO('best.pt')

# 3. Run inference on an image
image_path = 'path/to/your/comic_page.jpg'
results = model.predict(source=image_path)

# 4. Process and visualize results
# The 'results' object contains bounding boxes, classes, and confidence scores
for result in results:
    # Plotting will draw the bounding boxes on the image
    im_array = result.plot()
    im = Image.fromarray(im_array[..., ::-1]) # Convert BGR to RGB
    im.show() # Display the image
    # or
    # im.save('prediction_result.jpg')

# You can also access bounding box data directly
for box in results[0].boxes:
    print("Class:", model.names[int(box.cls)])
    print("Confidence:", box.conf.item())
    print("Coordinates (xyxy):", box.xyxy[0].tolist())
    print("-" * 20)

Training Procedure

The model was fine-tuned using transfer learning from a YOLOv12x checkpoint pre-trained on the COCO dataset.

Training Hyperparameters

  • Image Size: 640x640
  • Batch Size: 16
  • Optimizer: AdamW (lr=0.002)
  • Epochs: 200
  • Patience: 100 epochs for early stopping

Training and Validation Metrics

Evaluation

The model's performance was evaluated on the validation set during training. The final metrics are based on the checkpoint that achieved the highest mAP50-95.

Key Performance Metrics

Metric Value Description
mAP50 0.991 Mean Average Precision at IoU threshold 0.50.
mAP50-95 0.985 Mean Average Precision averaged over IoU thresholds from 0.50 to 0.95.

The model achieves near-perfect precision and recall on the validation data, indicating a strong ability to correctly identify comic panels within the styles present in the dataset.

Confusion Matrix

Qualitative Results

The model correctly identifies panels of various sizes and layouts in the validation set.

Validation Predictions

Intended Use and Limitations

This model is intended for applications requiring the segmentation of comic book pages into their constituent panels. This can be a pre-processing step for:

  • Creating structured digital reading experiences.
  • Extracting text or characters from individual panels.
  • Analyzing comic book layouts and artistic styles.

The model has been tested in real world applications and has shown promising results.

Limitations

  • Non-Rectangular Panels: The model is trained to detect rectangular bounding boxes and may struggle with highly irregular or overlapping panel shapes.

Acknowledgements

  • Ultralytics for the amazing YOLOv12 model and library.
  • Roboflow: for their dataset hosting platform and custom-workflow-3-object-detection-g24r5-fmfkb for compiling and annotating this incredible dataset.

This model card is based on the training notebook YOLOV12-Comic-Panel-Detection.

Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using mosesb/best-comic-panel-detection 1