File size: 4,984 Bytes
1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 acd0690 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 33834be 1bda270 acd0690 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
library_name: transformers
tags:
- computer-vision
- image-classification
- vit
- deepfake-detection
- binary-classification
- pytorch
license: mit
metrics:
- accuracy: 99.20%
base_model:
- facebook/deit-base-distilled-patch16-224
pipeline_tag: image-classification
---
# Model Card for Virtus
Virtus is a fine-tuned Vision Transformer (ViT) model for binary image classification, specifically trained to distinguish between real and deepfake images. It achieves **~99.2% accuracy** on a balanced dataset of over 190,000 images.
## Model Details
### Model Description
Virtus is based on `facebook/deit-base-distilled-patch16-224` and was fine-tuned on a binary classification task using a large dataset of real and fake facial images. The training process involved class balancing, data augmentation, and evaluation using accuracy and F1 score.
- **Developed by:** [Agasta](https://github.com/Itz-Agasta)
- **Funded by:** None
- **Shared by:** Agasta
- **Model type:** Vision Transformer (ViT) for image classification
- **Language(s):** N/A (vision model)
- **License:** MIT
- **Finetuned from model:** [facebook/deit-base-distilled-patch16-224](https://huggingface.co/facebook/deit-base-distilled-patch16-224)
### Model Sources
- **Repository:** [https://huggingface.co/agasta/virtus](https://huggingface.co/agasta/virtus)
## Uses
### Direct Use
This model can be used to predict whether an input image is a real or a deepfake. It can be deployed in image analysis pipelines or integrated into applications that require media authenticity detection.
### Downstream Use
Virtus may be used in broader deepfake detection systems, educational tools for detecting synthetic media, or pre-screening systems for online platforms.
### Out-of-Scope Use
- Detection of deepfakes in videos or audio
- General object classification tasks outside of the real/fake binary domain
## Bias, Risks, and Limitations
The dataset, while balanced, may still carry biases in facial features, lighting conditions, or demographics. The model is also not robust to non-standard input sizes or heavily occluded faces.
### Recommendations
- Use only on face images similar in nature to the training set.
- Do not use for critical or high-stakes decisions without human verification.
- Regularly re-evaluate performance with updated data.
## How to Get Started with the Model
```python
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from PIL import Image
import torch
model = AutoModelForImageClassification.from_pretrained("agasta/virtus")
extractor = AutoFeatureExtractor.from_pretrained("agasta/virtus")
image = Image.open("path_to_image.jpg")
inputs = extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_class])
```
## Training Details
### Training Data
The dataset consisted of 190,335 self-collected real and deepfake face images, with RandomOverSampler used to balance the two classes. The data was split into 60% training and 40% testing, maintaining class stratification.
### Training Procedure
#### Preprocessing
- Images resized to 224x224
- Augmentations: Random rotation, sharpness adjustments, normalization
#### Training Hyperparameters
- **Epochs:** 2
- **Learning rate:** 1e-6
- **Train batch size:** 32
- **Eval batch size:** 8
- **Weight decay:** 0.02
- **Optimizer:** AdamW (via Trainer API)
- **Mixed precision:** Not used
## Evaluation
### Testing Data
Same dataset, stratified 60:40 split, used for evaluation.
### Metrics
- **Accuracy**
- **F1 Score (macro)**
- **Confusion matrix**
- **Classification report**
### Results
- **Accuracy:** 99.20%
- **F1 Score (macro):** 0.9920
## Environmental Impact
- **Hardware Type:** NVIDIA Tesla V100 (Kaggle Notebook GPU)
- **Hours used:** ~2.3 hours
- **Cloud Provider:** Kaggle
- **Compute Region:** Unknown
- **Carbon Emitted:** Can be estimated via [MLCO2 Calculator](https://mlco2.github.io/impact#compute)
## Technical Specifications
### Model Architecture and Objective
The model is a distilled Vision Transformer (DeiT) designed for image classification with a binary objective: classify images as Real or Fake.
### Compute Infrastructure
- **Hardware:** 1x NVIDIA Tesla V100 GPU
- **Software:** PyTorch, Hugging Face Transformers, Datasets, Accelerate
## Citation
**BibTeX:**
```bibtex
@misc{virtus2025,
title={Virtus: Deepfake Detection using Vision Transformers},
author={Agasta},
year={2025},
howpublished={\url{https://huggingface.co/agasta/virtus}},
}
```
**APA:**
Agasta. (2025). *Virtus: Deepfake Detection using Vision Transformers*. Hugging Face. https://huggingface.co/agasta/virtus
## Model Card Contact
For questions or feedback, reach out via [GitHub](https://github.com/Itz-Agasta) or open an issue on the [model repository](https://github.com/Itz-Agasta/Lopt/tree/main/models/image). or mail me at [email protected] |