YOLOv11 Model for Drowsiness Detection

This repository contains a YOLO classification model fine-tuned to detect driver drowsiness from images. The model classifies input images into two categories: Drowsy and Non Drowsy (Awake).

This model was trained using the ultralytics framework and demonstrates high performance on an unseen test set, making it a reliable tool for safety applications.

Model Details

Base Model: yolo11x-cls (from the Ultralytics v8 ecosystem)
Fine-tuned on: A combined dataset for driver drowsiness detection.
Classes: Drowsy, Non Drowsy
Framework: PyTorch, Ultralytics

How to Get Started

You can easily use this model with the ultralytics library.

# Install ultralytics
!pip install ultralytics

from ultralytics import YOLO

# Load the model from the Hugging Face Hub
model = YOLO('your-username/your-repo-name')

# Run inference on an image
image_path = 'path/to/your/image.jpg'
results = model.predict(image_path)

# Print the top prediction
probs = results[0].probs
top1_class_index = probs.top1
top1_confidence = probs.top1conf
class_name = model.names[top1_class_index]

print(f"Prediction: {class_name} with confidence {top1_confidence:.4f}")

Training Procedure

The model was fine-tuned on a large dataset of driver images. The training process involved:

Data Augmentation: Standard augmentations like random flips, color jitter (HSV), and scaling were applied.
Transfer Learning: The model was initialized with weights pretrained on a large-scale dataset, enabling rapid convergence.

Key Hyperparameters

Image Size: 224x224
Batch Size: 185 (auto-tuned)
Optimizer: SGD with momentum

Evaluation

The model was evaluated on a completely unseen test set to ensure a fair assessment of its generalization capabilities.

Key Performance Metrics

Metric	Value	Description
Accuracy	99.80%	Overall correctness on the test set.
APCER	0.00%	Rate of 'Drowsy' drivers missed (False Negatives).
BPCER	0.41%	Rate of 'Non Drowsy' drivers flagged (False Positives).
ACER	0.21%	Average of APCER and BPCER.

APCER (Attack Presentation Classification Error Rate) is the most critical safety metric.

Model Explainability (Grad-CAM)

To ensure the model is focusing on relevant facial features, Grad-CAM was used. The heatmaps confirm that the model's predictions are primarily based on the eye and mouth regions, as expected.

Intended Use and Limitations

This model is intended as a proof-of-concept for driver safety systems. It should not be used as the sole mechanism for preventing accidents. Real-world performance may vary based on lighting conditions, camera angles, occlusions (e.g., sunglasses), and individual differences.

This model card is based on the training notebook yolov11_drowsiness.ipynb.