Image Quality Fusion Model

A multi-modal image quality assessment system that combines BRISQUE, Aesthetic Predictor, and CLIP features to predict human-like quality judgments on a 1-10 scale.

🎯 Model Description

This model fuses three complementary approaches to comprehensive image quality assessment:

🔧 BRISQUE (OpenCV): Technical quality assessment detecting blur, noise, compression artifacts, and distortions
🎨 Aesthetic Predictor (LAION): Visual appeal assessment using CLIP ViT-B-32 features trained on human aesthetic ratings
🧠 CLIP (OpenAI): Semantic understanding and high-level feature extraction for content awareness

The fusion network learns optimal weights to combine these diverse quality signals, producing human-like quality judgments that correlate strongly with subjective assessments.

🚀 Quick Start

Installation

pip install torch torchvision huggingface_hub opencv-python pillow open-clip-torch

Basic Usage

from huggingface_hub import PyTorchModelHubMixin
from PIL import Image

# Load the model
model = PyTorchModelHubMixin.from_pretrained("matthewyuan/image-quality-fusion")

# Predict quality for a single image
quality_score = model.predict_quality("path/to/your/image.jpg")
print(f"Image quality: {quality_score:.2f}/10")

# Batch prediction
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
scores = model.predict_batch(image_paths)
for path, score in zip(image_paths, scores):
    print(f"{path}: {score:.2f}/10")

Advanced Usage

# Load with PIL Image
from PIL import Image
image = Image.open("photo.jpg")
score = model.predict_quality(image)

# Works with different input formats
import numpy as np
image_array = np.array(image)
score = model.predict_quality(image_array)

# Get model information
info = model.get_model_info()
print(f"Model: {info['name']} v{info['version']}")
print(f"Performance: Correlation = {info['performance']['correlation']}")

📊 Performance Metrics

Evaluated on the SPAQ dataset (11,125 smartphone images with human quality ratings):

Metric	Value	Description
Pearson Correlation	0.520	Correlation with human judgments
R² Score	0.250	Coefficient of determination
Mean Absolute Error	1.41	Average prediction error (1-10 scale)
Root Mean Square Error	1.69	RMS prediction error

Comparison with Individual Components

Method	Correlation	R² Score	MAE
Fusion Model	0.520	0.250	1.41
BRISQUE Only	0.31	0.12	2.1
Aesthetic Only	0.41	0.18	1.8
CLIP Only	0.28	0.09	2.3

The fusion approach significantly outperforms individual components.

🏗️ Model Architecture

Input Image (RGB)
    ├── OpenCV BRISQUE → Technical Quality Score (0-100, normalized)
    ├── LAION Aesthetic → Aesthetic Score (0-10, normalized) 
    └── OpenAI CLIP-B32 → Semantic Features (512-dimensional)
                ↓
        Feature Fusion Network
        ┌─────────────────────────┐
        │ BRISQUE: 1D → 64 → 128  │
        │ Aesthetic: 1D → 64 → 128│  
        │ CLIP: 512D → 256 → 128  │
        └─────────────────────────┘
                ↓ (concat)
        Deep Fusion Layers (384D → 256D → 128D → 1D)
        Dropout (0.3) + ReLU activations
                ↓
        Human-like Quality Score (1.0 - 10.0)

Technical Details

Input Resolution: Any size (resized to 224×224 for CLIP)
Architecture: Feed-forward neural network with residual connections
Activation Functions: ReLU for hidden layers, Linear for output
Regularization: Dropout (0.3), Early stopping
Output Range: 1.0 - 10.0 (human rating scale)
Parameters: ~2.1M total parameters

🔬 Training Details

Dataset

Name: SPAQ (Smartphone Photography Attribute and Quality)
Size: 11,125 high-resolution smartphone images
Annotations: Human quality ratings (1-10 scale, 5+ annotators per image)
Split: 80% train, 10% validation, 10% test
Domain: Consumer smartphone photography

Training Configuration

Framework: PyTorch 2.0+ with MPS acceleration (M1 optimized)
Optimizer: AdamW (lr=1e-3, weight_decay=1e-4)
Batch Size: 128 (optimized for 32GB unified memory)
Epochs: 50 with early stopping (patience=10)
Loss Function: Mean Squared Error (MSE)
Learning Rate Schedule: ReduceLROnPlateau (factor=0.5, patience=5)
Hardware: M1 MacBook Pro (32GB RAM)
Training Time: ~1 hour (with feature caching)

Optimization Techniques

Mixed Precision Training: MPS autocast for M1 acceleration
Feature Caching: Pre-computed embeddings for 20-30x speedup
Data Loading: Optimized DataLoader (6-8 workers, memory pinning)
Memory Management: Garbage collection every 10 batches
Preprocessing Pipeline: Parallel BRISQUE computation

📱 Use Cases

Professional Applications

Content Management: Automatic quality filtering for large image databases
Social Media: Real-time quality assessment for user uploads
E-commerce: Product image quality validation
Digital Asset Management: Automated quality scoring for photo libraries

Research Applications

Image Quality Research: Benchmark for perceptual quality metrics
Dataset Curation: Quality-based dataset filtering and ranking
Human Perception Studies: Computational model of aesthetic judgment
Multi-modal Learning: Example of successful feature fusion

Creative Applications

Photography Tools: Automated photo rating and selection
Mobile Apps: Real-time quality feedback during capture
Photo Editing: Quality-guided automatic enhancement
Portfolio Management: Intelligent photo organization

⚠️ Limitations and Biases

Model Limitations

Domain Specificity: Trained primarily on smartphone photography
Resolution Dependency: Performance may vary with very low/high resolution images
Cultural Bias: Aesthetic preferences may reflect training data demographics
Temporal Bias: Training data from specific time period may not reflect evolving preferences

Technical Limitations

BRISQUE Scope: May not capture all types of technical degradation
CLIP Bias: Inherits biases from CLIP's training data
Aesthetic Subjectivity: Individual preferences vary significantly
Computational Requirements: Requires GPU for optimal inference speed

Recommended Usage

Validation: Always validate on your specific domain before production use
Human Oversight: Use as a tool to assist, not replace, human judgment
Bias Mitigation: Consider diverse evaluation datasets
Performance Monitoring: Monitor performance on your specific use case

📚 Citation

If you use this model in your research, please cite:

@misc{image-quality-fusion-2024,
  title={Image Quality Fusion: Multi-Modal Assessment with BRISQUE, Aesthetic, and CLIP Features},
  author={Matthew Yuan},
  year={2024},
  howpublished={\url{https://huggingface.co/matthewyuan/image-quality-fusion}},
  note={Trained on SPAQ dataset, deployed via GitHub Actions CI/CD}
}

🔗 Related Work

Datasets

SPAQ Dataset - Smartphone Photography Attribute and Quality
AVA Dataset - Aesthetic Visual Analysis
LIVE IQA - Laboratory for Image & Video Engineering

Models

LAION Aesthetic Predictor - Aesthetic scoring model
OpenCLIP - Open source CLIP implementation
BRISQUE - Blind/Referenceless Image Spatial Quality Evaluator

🛠️ Development

Local Development

# Clone repository
git clone https://github.com/mattkyuan/image-quality-fusion.git
cd image-quality-fusion

# Install dependencies  
pip install -r requirements.txt

# Run training
python src/image_quality_fusion/training/train_fusion.py \
    --image_dir data/images \
    --annotations data/annotations.csv \
    --prepare_data \
    --epochs 50

CI/CD Pipeline

This model is automatically deployed via GitHub Actions:

Training Pipeline: Automated model training on code changes
Deployment Pipeline: Automatic HF Hub deployment on model updates
Testing Pipeline: Comprehensive model validation and testing

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SPAQ Dataset: H4nwei et al. for the comprehensive smartphone photography dataset
LAION: For the aesthetic predictor model and training methodology
OpenAI: For CLIP model architecture and pre-trained weights
OpenCV: For BRISQUE implementation and computer vision tools
Hugging Face: For model hosting and deployment infrastructure
PyTorch Team: For the deep learning framework and MPS acceleration

📞 Contact

Repository: github.com/mattkyuan/image-quality-fusion
Issues: GitHub Issues
Hugging Face: matthewyuan/image-quality-fusion

This model was trained and deployed using automated CI/CD pipelines for reproducible ML workflows.

matthewyuan
/

image-quality-fusion