virtus / README.md
agasta's picture
Update README.md
acd0690 verified
|
raw
history blame contribute delete
4.98 kB
---
library_name: transformers
tags:
- computer-vision
- image-classification
- vit
- deepfake-detection
- binary-classification
- pytorch
license: mit
metrics:
- accuracy: 99.20%
base_model:
- facebook/deit-base-distilled-patch16-224
pipeline_tag: image-classification
---
# Model Card for Virtus
Virtus is a fine-tuned Vision Transformer (ViT) model for binary image classification, specifically trained to distinguish between real and deepfake images. It achieves **~99.2% accuracy** on a balanced dataset of over 190,000 images.
## Model Details
### Model Description
Virtus is based on `facebook/deit-base-distilled-patch16-224` and was fine-tuned on a binary classification task using a large dataset of real and fake facial images. The training process involved class balancing, data augmentation, and evaluation using accuracy and F1 score.
- **Developed by:** [Agasta](https://github.com/Itz-Agasta)
- **Funded by:** None
- **Shared by:** Agasta
- **Model type:** Vision Transformer (ViT) for image classification
- **Language(s):** N/A (vision model)
- **License:** MIT
- **Finetuned from model:** [facebook/deit-base-distilled-patch16-224](https://huggingface.co/facebook/deit-base-distilled-patch16-224)
### Model Sources
- **Repository:** [https://huggingface.co/agasta/virtus](https://huggingface.co/agasta/virtus)
## Uses
### Direct Use
This model can be used to predict whether an input image is a real or a deepfake. It can be deployed in image analysis pipelines or integrated into applications that require media authenticity detection.
### Downstream Use
Virtus may be used in broader deepfake detection systems, educational tools for detecting synthetic media, or pre-screening systems for online platforms.
### Out-of-Scope Use
- Detection of deepfakes in videos or audio
- General object classification tasks outside of the real/fake binary domain
## Bias, Risks, and Limitations
The dataset, while balanced, may still carry biases in facial features, lighting conditions, or demographics. The model is also not robust to non-standard input sizes or heavily occluded faces.
### Recommendations
- Use only on face images similar in nature to the training set.
- Do not use for critical or high-stakes decisions without human verification.
- Regularly re-evaluate performance with updated data.
## How to Get Started with the Model
```python
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from PIL import Image
import torch
model = AutoModelForImageClassification.from_pretrained("agasta/virtus")
extractor = AutoFeatureExtractor.from_pretrained("agasta/virtus")
image = Image.open("path_to_image.jpg")
inputs = extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_class])
```
## Training Details
### Training Data
The dataset consisted of 190,335 self-collected real and deepfake face images, with RandomOverSampler used to balance the two classes. The data was split into 60% training and 40% testing, maintaining class stratification.
### Training Procedure
#### Preprocessing
- Images resized to 224x224
- Augmentations: Random rotation, sharpness adjustments, normalization
#### Training Hyperparameters
- **Epochs:** 2
- **Learning rate:** 1e-6
- **Train batch size:** 32
- **Eval batch size:** 8
- **Weight decay:** 0.02
- **Optimizer:** AdamW (via Trainer API)
- **Mixed precision:** Not used
## Evaluation
### Testing Data
Same dataset, stratified 60:40 split, used for evaluation.
### Metrics
- **Accuracy**
- **F1 Score (macro)**
- **Confusion matrix**
- **Classification report**
### Results
- **Accuracy:** 99.20%
- **F1 Score (macro):** 0.9920
## Environmental Impact
- **Hardware Type:** NVIDIA Tesla V100 (Kaggle Notebook GPU)
- **Hours used:** ~2.3 hours
- **Cloud Provider:** Kaggle
- **Compute Region:** Unknown
- **Carbon Emitted:** Can be estimated via [MLCO2 Calculator](https://mlco2.github.io/impact#compute)
## Technical Specifications
### Model Architecture and Objective
The model is a distilled Vision Transformer (DeiT) designed for image classification with a binary objective: classify images as Real or Fake.
### Compute Infrastructure
- **Hardware:** 1x NVIDIA Tesla V100 GPU
- **Software:** PyTorch, Hugging Face Transformers, Datasets, Accelerate
## Citation
**BibTeX:**
```bibtex
@misc{virtus2025,
title={Virtus: Deepfake Detection using Vision Transformers},
author={Agasta},
year={2025},
howpublished={\url{https://huggingface.co/agasta/virtus}},
}
```
**APA:**
Agasta. (2025). *Virtus: Deepfake Detection using Vision Transformers*. Hugging Face. https://huggingface.co/agasta/virtus
## Model Card Contact
For questions or feedback, reach out via [GitHub](https://github.com/Itz-Agasta) or open an issue on the [model repository](https://github.com/Itz-Agasta/Lopt/tree/main/models/image). or mail me at [email protected]