🛡️ Advanced Deepfake Detection Engine

This repository contains a powerful model for detecting deepfake videos. It leverages a sophisticated hybrid architecture combining ResNext50 for spatial feature extraction and an LSTM network for temporal analysis, allowing it to effectively discern authentic videos from synthetically manipulated ones.

✨ Core Features

High Accuracy: Achieves 87% accuracy on our internal test datasets, providing a reliable classification of video authenticity.
Hybrid Architecture: The fusion of CNN (ResNext50) and RNN (LSTM) captures both the visual artifacts within frames and the temporal inconsistencies between them.
Built with PyTorch: A flexible and powerful framework for deep learning.
Easy to Use: Integrated with the Hugging Face pipeline for quick and straightforward inference.

⚙️ Model Specifications

Parameter	Details
Model Type	Video Classification
Primary Task	Binary Deepfake Detection
Framework	`PyTorch`
Training Data	Trained on a diverse, custom-built dataset comprising real and deepfake videos.
Output	A binary classification (`real`/`fake`) coupled with a confidence score.
Architecture	`ResNext50` + `LSTM`

🧠 Under the Hood: Model Architecture

The model's effectiveness stems from its two-stage architecture designed to mimic how a human might spot inconsistencies in a video:

Spatial Feature Extractor (ResNext50): For each frame in a video sequence, a pre-trained ResNext50 network acts as a powerful backbone. It analyzes the image to extract a rich set of spatial features, identifying subtle visual artifacts, unnatural textures, or lighting inconsistencies that are common hallmarks of deepfakes.
Temporal Sequence Analyzer (LSTM): The sequence of feature vectors extracted by ResNext50 is then fed into a Long Short-Term Memory (LSTM) network. The LSTM's role is to analyze the temporal flow of these features over time. It excels at detecting unnatural transitions, flickering, or inconsistent movements between frames—anomalies that are often missed by single-frame analysis.

This combination allows the model to build a comprehensive understanding of both the look and the flow of a video to make a final, robust prediction.

⚠️ Limitations & Important Considerations

Focus on Human Faces: The model was predominantly trained on datasets featuring human faces. Its performance is optimized for this domain and may be less reliable on videos without clear facial shots.
Video Quality Dependency: Performance can be affected by video quality. Heavy compression, low resolution, or excessive motion blur might obscure the subtle artifacts the model looks for.
Optimal Sequence Length: The model was designed and trained on video clips of 20 frames. While it can process videos of other lengths, its optimal performance is achieved on sequences of this approximate duration.
Ethical Use: This tool is intended for beneficial applications, such as combating misinformation and verifying media authenticity. Users are responsible for deploying it ethically and in accordance with privacy regulations.

🚀 Quickstart: Usage with `transformers`

Get started in just a few lines of Python. The easiest way to use this model is with the video-classification pipeline from the transformers library.