|
--- |
|
library_name: transformers |
|
tags: |
|
- computer-vision |
|
- image-classification |
|
- vit |
|
- deepfake-detection |
|
- binary-classification |
|
- pytorch |
|
license: mit |
|
metrics: |
|
- accuracy: 99.20% |
|
base_model: |
|
- facebook/deit-base-distilled-patch16-224 |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
|
|
# Model Card for Virtus |
|
|
|
Virtus is a fine-tuned Vision Transformer (ViT) model for binary image classification, specifically trained to distinguish between real and deepfake images. It achieves **~99.2% accuracy** on a balanced dataset of over 190,000 images. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Virtus is based on `facebook/deit-base-distilled-patch16-224` and was fine-tuned on a binary classification task using a large dataset of real and fake facial images. The training process involved class balancing, data augmentation, and evaluation using accuracy and F1 score. |
|
|
|
- **Developed by:** [Agasta](https://github.com/Itz-Agasta) |
|
- **Funded by:** None |
|
- **Shared by:** Agasta |
|
- **Model type:** Vision Transformer (ViT) for image classification |
|
- **Language(s):** N/A (vision model) |
|
- **License:** MIT |
|
- **Finetuned from model:** [facebook/deit-base-distilled-patch16-224](https://huggingface.co/facebook/deit-base-distilled-patch16-224) |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [https://huggingface.co/agasta/virtus](https://huggingface.co/agasta/virtus) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used to predict whether an input image is a real or a deepfake. It can be deployed in image analysis pipelines or integrated into applications that require media authenticity detection. |
|
|
|
### Downstream Use |
|
|
|
Virtus may be used in broader deepfake detection systems, educational tools for detecting synthetic media, or pre-screening systems for online platforms. |
|
|
|
### Out-of-Scope Use |
|
|
|
- Detection of deepfakes in videos or audio |
|
- General object classification tasks outside of the real/fake binary domain |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The dataset, while balanced, may still carry biases in facial features, lighting conditions, or demographics. The model is also not robust to non-standard input sizes or heavily occluded faces. |
|
|
|
### Recommendations |
|
|
|
- Use only on face images similar in nature to the training set. |
|
- Do not use for critical or high-stakes decisions without human verification. |
|
- Regularly re-evaluate performance with updated data. |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoFeatureExtractor, AutoModelForImageClassification |
|
from PIL import Image |
|
import torch |
|
|
|
model = AutoModelForImageClassification.from_pretrained("agasta/virtus") |
|
extractor = AutoFeatureExtractor.from_pretrained("agasta/virtus") |
|
|
|
image = Image.open("path_to_image.jpg") |
|
inputs = extractor(images=image, return_tensors="pt") |
|
outputs = model(**inputs) |
|
predicted_class = outputs.logits.argmax(-1).item() |
|
print(model.config.id2label[predicted_class]) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The dataset consisted of 190,335 self-collected real and deepfake face images, with RandomOverSampler used to balance the two classes. The data was split into 60% training and 40% testing, maintaining class stratification. |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
- Images resized to 224x224 |
|
- Augmentations: Random rotation, sharpness adjustments, normalization |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Epochs:** 2 |
|
- **Learning rate:** 1e-6 |
|
- **Train batch size:** 32 |
|
- **Eval batch size:** 8 |
|
- **Weight decay:** 0.02 |
|
- **Optimizer:** AdamW (via Trainer API) |
|
- **Mixed precision:** Not used |
|
|
|
|
|
|
|
## Evaluation |
|
|
|
### Testing Data |
|
|
|
Same dataset, stratified 60:40 split, used for evaluation. |
|
|
|
### Metrics |
|
|
|
- **Accuracy** |
|
- **F1 Score (macro)** |
|
- **Confusion matrix** |
|
- **Classification report** |
|
|
|
### Results |
|
|
|
- **Accuracy:** 99.20% |
|
- **F1 Score (macro):** 0.9920 |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** NVIDIA Tesla V100 (Kaggle Notebook GPU) |
|
- **Hours used:** ~2.3 hours |
|
- **Cloud Provider:** Kaggle |
|
- **Compute Region:** Unknown |
|
- **Carbon Emitted:** Can be estimated via [MLCO2 Calculator](https://mlco2.github.io/impact#compute) |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
The model is a distilled Vision Transformer (DeiT) designed for image classification with a binary objective: classify images as Real or Fake. |
|
|
|
### Compute Infrastructure |
|
|
|
- **Hardware:** 1x NVIDIA Tesla V100 GPU |
|
- **Software:** PyTorch, Hugging Face Transformers, Datasets, Accelerate |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@misc{virtus2025, |
|
title={Virtus: Deepfake Detection using Vision Transformers}, |
|
author={Agasta}, |
|
year={2025}, |
|
howpublished={\url{https://huggingface.co/agasta/virtus}}, |
|
} |
|
``` |
|
|
|
**APA:** |
|
Agasta. (2025). *Virtus: Deepfake Detection using Vision Transformers*. Hugging Face. https://huggingface.co/agasta/virtus |
|
|
|
## Model Card Contact |
|
|
|
For questions or feedback, reach out via [GitHub](https://github.com/Itz-Agasta) or open an issue on the [model repository](https://github.com/Itz-Agasta/Lopt/tree/main/models/image). or mail me at [email protected] |