Image Quality Fusion Model
A multi-modal image quality assessment system that combines BRISQUE, Aesthetic Predictor, and CLIP features to predict human-like quality judgments on a 1-10 scale.
π― Model Description
This model fuses three complementary approaches to comprehensive image quality assessment:
- π§ BRISQUE (OpenCV): Technical quality assessment detecting blur, noise, compression artifacts, and distortions
- π¨ Aesthetic Predictor (LAION): Visual appeal assessment using CLIP ViT-B-32 features trained on human aesthetic ratings
- π§ CLIP (OpenAI): Semantic understanding and high-level feature extraction for content awareness
The fusion network learns optimal weights to combine these diverse quality signals, producing human-like quality judgments that correlate strongly with subjective assessments.
π Quick Start
Installation
pip install torch torchvision huggingface_hub opencv-python pillow open-clip-torch
Basic Usage
from huggingface_hub import PyTorchModelHubMixin
from PIL import Image
# Load the model
model = PyTorchModelHubMixin.from_pretrained("matthewyuan/image-quality-fusion")
# Predict quality for a single image
quality_score = model.predict_quality("path/to/your/image.jpg")
print(f"Image quality: {quality_score:.2f}/10")
# Batch prediction
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
scores = model.predict_batch(image_paths)
for path, score in zip(image_paths, scores):
print(f"{path}: {score:.2f}/10")
Advanced Usage
# Load with PIL Image
from PIL import Image
image = Image.open("photo.jpg")
score = model.predict_quality(image)
# Works with different input formats
import numpy as np
image_array = np.array(image)
score = model.predict_quality(image_array)
# Get model information
info = model.get_model_info()
print(f"Model: {info['name']} v{info['version']}")
print(f"Performance: Correlation = {info['performance']['correlation']}")
π Performance Metrics
Evaluated on the SPAQ dataset (11,125 smartphone images with human quality ratings):
Metric | Value | Description |
---|---|---|
Pearson Correlation | 0.520 | Correlation with human judgments |
RΒ² Score | 0.250 | Coefficient of determination |
Mean Absolute Error | 1.41 | Average prediction error (1-10 scale) |
Root Mean Square Error | 1.69 | RMS prediction error |
Comparison with Individual Components
Method | Correlation | RΒ² Score | MAE |
---|---|---|---|
Fusion Model | 0.520 | 0.250 | 1.41 |
BRISQUE Only | 0.31 | 0.12 | 2.1 |
Aesthetic Only | 0.41 | 0.18 | 1.8 |
CLIP Only | 0.28 | 0.09 | 2.3 |
The fusion approach significantly outperforms individual components.
ποΈ Model Architecture
Input Image (RGB)
βββ OpenCV BRISQUE β Technical Quality Score (0-100, normalized)
βββ LAION Aesthetic β Aesthetic Score (0-10, normalized)
βββ OpenAI CLIP-B32 β Semantic Features (512-dimensional)
β
Feature Fusion Network
βββββββββββββββββββββββββββ
β BRISQUE: 1D β 64 β 128 β
β Aesthetic: 1D β 64 β 128β
β CLIP: 512D β 256 β 128 β
βββββββββββββββββββββββββββ
β (concat)
Deep Fusion Layers (384D β 256D β 128D β 1D)
Dropout (0.3) + ReLU activations
β
Human-like Quality Score (1.0 - 10.0)
Technical Details
- Input Resolution: Any size (resized to 224Γ224 for CLIP)
- Architecture: Feed-forward neural network with residual connections
- Activation Functions: ReLU for hidden layers, Linear for output
- Regularization: Dropout (0.3), Early stopping
- Output Range: 1.0 - 10.0 (human rating scale)
- Parameters: ~2.1M total parameters
π¬ Training Details
Dataset
- Name: SPAQ (Smartphone Photography Attribute and Quality)
- Size: 11,125 high-resolution smartphone images
- Annotations: Human quality ratings (1-10 scale, 5+ annotators per image)
- Split: 80% train, 10% validation, 10% test
- Domain: Consumer smartphone photography
Training Configuration
- Framework: PyTorch 2.0+ with MPS acceleration (M1 optimized)
- Optimizer: AdamW (lr=1e-3, weight_decay=1e-4)
- Batch Size: 128 (optimized for 32GB unified memory)
- Epochs: 50 with early stopping (patience=10)
- Loss Function: Mean Squared Error (MSE)
- Learning Rate Schedule: ReduceLROnPlateau (factor=0.5, patience=5)
- Hardware: M1 MacBook Pro (32GB RAM)
- Training Time: ~1 hour (with feature caching)
Optimization Techniques
- Mixed Precision Training: MPS autocast for M1 acceleration
- Feature Caching: Pre-computed embeddings for 20-30x speedup
- Data Loading: Optimized DataLoader (6-8 workers, memory pinning)
- Memory Management: Garbage collection every 10 batches
- Preprocessing Pipeline: Parallel BRISQUE computation
π± Use Cases
Professional Applications
- Content Management: Automatic quality filtering for large image databases
- Social Media: Real-time quality assessment for user uploads
- E-commerce: Product image quality validation
- Digital Asset Management: Automated quality scoring for photo libraries
Research Applications
- Image Quality Research: Benchmark for perceptual quality metrics
- Dataset Curation: Quality-based dataset filtering and ranking
- Human Perception Studies: Computational model of aesthetic judgment
- Multi-modal Learning: Example of successful feature fusion
Creative Applications
- Photography Tools: Automated photo rating and selection
- Mobile Apps: Real-time quality feedback during capture
- Photo Editing: Quality-guided automatic enhancement
- Portfolio Management: Intelligent photo organization
β οΈ Limitations and Biases
Model Limitations
- Domain Specificity: Trained primarily on smartphone photography
- Resolution Dependency: Performance may vary with very low/high resolution images
- Cultural Bias: Aesthetic preferences may reflect training data demographics
- Temporal Bias: Training data from specific time period may not reflect evolving preferences
Technical Limitations
- BRISQUE Scope: May not capture all types of technical degradation
- CLIP Bias: Inherits biases from CLIP's training data
- Aesthetic Subjectivity: Individual preferences vary significantly
- Computational Requirements: Requires GPU for optimal inference speed
Recommended Usage
- Validation: Always validate on your specific domain before production use
- Human Oversight: Use as a tool to assist, not replace, human judgment
- Bias Mitigation: Consider diverse evaluation datasets
- Performance Monitoring: Monitor performance on your specific use case
π Citation
If you use this model in your research, please cite:
@misc{image-quality-fusion-2024,
title={Image Quality Fusion: Multi-Modal Assessment with BRISQUE, Aesthetic, and CLIP Features},
author={Matthew Yuan},
year={2024},
howpublished={\url{https://huggingface.co/matthewyuan/image-quality-fusion}},
note={Trained on SPAQ dataset, deployed via GitHub Actions CI/CD}
}
π Related Work
Datasets
- SPAQ Dataset - Smartphone Photography Attribute and Quality
- AVA Dataset - Aesthetic Visual Analysis
- LIVE IQA - Laboratory for Image & Video Engineering
Models
- LAION Aesthetic Predictor - Aesthetic scoring model
- OpenCLIP - Open source CLIP implementation
- BRISQUE - Blind/Referenceless Image Spatial Quality Evaluator
π οΈ Development
Local Development
# Clone repository
git clone https://github.com/mattkyuan/image-quality-fusion.git
cd image-quality-fusion
# Install dependencies
pip install -r requirements.txt
# Run training
python src/image_quality_fusion/training/train_fusion.py \
--image_dir data/images \
--annotations data/annotations.csv \
--prepare_data \
--epochs 50
CI/CD Pipeline
This model is automatically deployed via GitHub Actions:
- Training Pipeline: Automated model training on code changes
- Deployment Pipeline: Automatic HF Hub deployment on model updates
- Testing Pipeline: Comprehensive model validation and testing
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- SPAQ Dataset: H4nwei et al. for the comprehensive smartphone photography dataset
- LAION: For the aesthetic predictor model and training methodology
- OpenAI: For CLIP model architecture and pre-trained weights
- OpenCV: For BRISQUE implementation and computer vision tools
- Hugging Face: For model hosting and deployment infrastructure
- PyTorch Team: For the deep learning framework and MPS acceleration
π Contact
- Repository: github.com/mattkyuan/image-quality-fusion
- Issues: GitHub Issues
- Hugging Face: matthewyuan/image-quality-fusion
This model was trained and deployed using automated CI/CD pipelines for reproducible ML workflows.
- Downloads last month
- 38
Model tree for matthewyuan/image-quality-fusion
Base model
openai/clip-vit-base-patch32