File size: 4,541 Bytes
3a57cd5
 
 
 
 
 
 
 
 
 
 
 
 
ef9e684
3a57cd5
 
1e42170
3a57cd5
1e42170
3a57cd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: MCP Hackathon Deepfake Watchdog
emoji: πŸ›‘οΈ
colorFrom: green
colorTo: gray
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
short_description: Upload your image and/or voice to scan for deepfake misuse o
---
# πŸ›‘οΈ Deepfake Watchdog πŸ€–

🏷️ Tags: `mcp-server-track` `agent-demo-track`

[Video Link](https://www.dropbox.com/scl/fi/bibsndi4tazikhozreg7m/2025-06-09-23-39-26.mkv?rlkey=mhq1mgbr1wuhr1k48hv49i1ga&st=7hbov0m4&dl=0)

A multimodal AI agent that detects deepfake content using **face**, **voice**, and **video** verification. Built with 🧠 DeepFace, πŸ—£οΈ SpeechBrain, πŸŽ₯ OpenCV, and accelerated with ☁️ Modal for serverless GPU-powered execution. Designed for the **MCP HuggingFace's Hackathon**.

---

## πŸš€ Features

- πŸ§‘β€πŸ¦± **Face Verification**: Compare two faces and detect if they belong to the same person using DeepFace.
- πŸŽ™οΈ **Voice Verification**: Determine if two audio samples are from the same speaker with SpeechBrain.
- 🎞️ **Video Scan for Deepfakes**: Scan a video (or YouTube link) and match all faces against a reference image.
- πŸ“„ **PDF Report**: Generate a downloadable report with all scan results.
- 🧠 **Runs as an MCP Agent**: Compatible with [Gradio MCP](https://huggingface.co/docs/mcp), fully pluggable into agentic workflows.
- ⚑ **Modal Offloading**: GPU-backed execution for DeepFace & SpeechBrain using [Modal](https://modal.com).

---

## 🧱 Architecture

Gradio UI ──> MCP Agent ──> Modal Functions
└─> verify_faces_remote()
└─> verify_voices_remote()
└─> verify_faces_in_video_remote()


---

## 🧩 Tech Stack

| Tool         | Purpose                          |
|--------------|----------------------------------|
| `Gradio`     | Frontend UI + MCP integration    |
| `Modal`      | GPU offloading for ML workloads  |
| `DeepFace`   | Face verification                |
| `SpeechBrain`| Voice verification               |
| `OpenCV`     | Video processing                 |
| `Pillow`     | Image manipulation               |
| `TensorFlow` | Required by DeepFace             |
| `FFmpeg`     | (Optional) for YouTube downloads |

---

## πŸ› οΈ Setup Instructions

### 1. Clone the repo

```bash
git clone https://huggingface.co/spaces/abetavarez/MCP-Hackathon-Deepfake-Watchdog
cd MCP-Hackathon-Deepfake-Watchdog
```

### 2. Create .env
```bash
cp .env.example .env
# Add your API keys or configs if required
```

### 3. Install dependencies (locally)
```bash
pip install -r requirements.txt
```

### 4. Run the app (with MCP)
```bash
modal run app.py
```

or for development::
```bash
python app.py
```

## βš™οΈ Modal Setup
The app uses a single modal_app/backend.py file that defines:

verify_faces_remote

verify_voices_remote

verify_faces_in_video_remote

Each function is GPU-accelerated and runs independently on demand.

```bash
modal deploy modal_app.py
```

## 🧠 Agent MCP Integration
This app runs as an MCP server using:

```python
demo.launch(mcp_server=True)
```
You can use this tool in your LlamaIndex, Hugging Face or LangChain agent as a remote service.

πŸ† Hackathon Tracks
βœ… Track 1: MCP Tool β€” Turned into a reusable tool with mcp_server=True
βœ… Track 3: Agentic Demo β€” A fully capable AI agent for deepfake detection (LlamaIndex)
βœ… Modal β€” Offloaded heavy tasks to Modal’s GPU cloud

## 🧱 Folder Structure
.
β”œβ”€β”€ app.py                # Main app entry point (Gradio + MCP)
β”œβ”€β”€ llama_agent.py        # LlamaIndex Agent implementation
β”œβ”€β”€ tools.py              # Tools for LlamaIndex Agent and MCP
β”œβ”€β”€ modal_app/
β”‚   β”œβ”€β”€ backend.py        # Modal remote functions
β”œβ”€β”€ detector/
β”‚   β”œβ”€β”€ face.py           # Local face verification logic
β”‚   β”œβ”€β”€ voice.py          # Local voice verification logic
β”‚   └── video.py          # Local video scan logic
β”œβ”€β”€ reports/
β”‚   └── pdf_report.py     # PDF report generation
β”œβ”€β”€ utils/
β”‚   └── youtube_utils.py  # YouTube download helper
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
└── .env.example


##πŸ§‘β€πŸ’» Author
Abraham Efren Tavarez
GitHub -  [@AbeTavarez](https://github.com/AbeTavarez)
LinkedIn β€” [@abrahametavarez](https://www.linkedin.com/in/abrahametavarez/)

## ✨ Future Improvements
 Add S3-compatible cloud storage or video ingest

 Add Nebius S3 support for video uploads

 Add RAG-based evidence summarizer

 Enable auto-scan via background agent

 Add QR code for mobile voice capture