Taino commited on
Commit
f5aec45
·
verified ·
1 Parent(s): 588a03f

Upload 6 files

Browse files
Files changed (6) hide show
  1. .gitignore +4 -0
  2. ReadMe.md +221 -0
  3. app.py +371 -0
  4. detection.pt +3 -0
  5. detection.py +215 -0
  6. requirements.txt +14 -0
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ .gradio
2
+ *.mp4
3
+ *.json
4
+ *.log
ReadMe.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎥 Video Person Detection & Tracking with ReID
2
+
3
+ A sophisticated computer vision application that combines YOLOv8, InsightFace, and TorchReID for robust person detection, tracking, and re-identification in videos. The application provides a user-friendly Gradio interface for easy video processing.
4
+
5
+ ## 🔧 Technology Stack
6
+
7
+ - **YOLOv8**: Real-time person detection
8
+ - **ByteTrack**: Multi-object tracking algorithm
9
+ - **InsightFace**: Facial feature extraction for person identification
10
+ - **OSNet**: Full-body re-identification features
11
+ - **Gradio**: Web-based user interface
12
+
13
+ ## 📋 Features
14
+
15
+ - Real-time person detection and tracking
16
+ - Consistent person re-identification across frames
17
+ - Face and body feature extraction
18
+ - Interactive web interface
19
+ - JSON export of tracking data
20
+ - Support for multiple video formats
21
+
22
+ ## 🚀 Quick Start
23
+
24
+ ### Prerequisites
25
+
26
+ **System Requirements:**
27
+ - Python 3.8 or higher
28
+ - CUDA-compatible GPU (recommended for better performance)
29
+ - At least 4GB RAM
30
+ - 2GB free disk space
31
+
32
+ **Platform-Specific Dependencies:**
33
+
34
+ **Linux:**
35
+ ```bash
36
+ # Install g++ compiler (required for InsightFace)
37
+ sudo apt-get update
38
+ sudo apt-get install g++ build-essential
39
+ ```
40
+
41
+ **Windows:**
42
+ - Install [Microsoft Visual C++ Redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) (latest version)
43
+ - Ensure you have Visual Studio Build Tools or Visual Studio Community installed
44
+
45
+ **macOS:**
46
+ ```bash
47
+ # Install Xcode command line tools
48
+ xcode-select --install
49
+ ```
50
+
51
+ ### Installation
52
+
53
+ 1. **Clone the repository:**
54
+ ```bash
55
+ git clone [email protected]:zebshah7851/object-detection-and-tracking.git
56
+ cd video-person-tracking
57
+ ```
58
+
59
+ 2. **Create a virtual environment:**
60
+ ```bash
61
+ python -m venv venv
62
+
63
+ # Activate virtual environment
64
+ # On Windows:
65
+ venv\Scripts\activate
66
+ # On Linux/macOS:
67
+ source venv/bin/activate
68
+ ```
69
+
70
+ 3. **Install dependencies:**
71
+ ```bash
72
+ pip install --upgrade pip
73
+ pip install -r requirements.txt
74
+ ```
75
+
76
+ **Note:** The installation process may take 10-15 minutes due to large model downloads (PyTorch, CUDA libraries, etc.).
77
+
78
+ ### Model Setup
79
+
80
+ The application requires several pre-trained models:
81
+
82
+ 1. **YOLOv8 Detection Model:**
83
+ - Place your trained `detection.pt` model file in the project root directory
84
+ - Alternatively, the app will download a default YOLOv8 model on first run
85
+
86
+ 2. **InsightFace Model:**
87
+ - The `buffalo_l` model will be automatically downloaded on first run
88
+ - Requires ~2GB of storage space
89
+
90
+ 3. **TorchReID Model:**
91
+ - The `osnet_x0_25` model will be automatically downloaded
92
+ - Pre-trained on Market1501 dataset
93
+
94
+ ### Running the Application
95
+
96
+ 1. **Start the Gradio interface:**
97
+ ```bash
98
+ python app.py
99
+ ```
100
+
101
+ 2. **Access the web interface:**
102
+ - Open your browser and navigate to: `http://127.0.0.1:7860`
103
+ - The interface will load automatically
104
+
105
+ 3. **Process videos:**
106
+ - Upload a video file (MP4, AVI, MOV, WEBM)
107
+ - Click "🚀 Process Video"
108
+ - Download the processed video and tracking data
109
+
110
+ ## 📁 Project Structure
111
+
112
+ ```
113
+ video-person-tracking/
114
+ ├── app.py # Gradio web interface
115
+ ├── detection.py # Core detection script
116
+ ├── requirements.txt # Python dependencies
117
+ ├── README.md # This file
118
+ ├── outputs/ # Generated output files
119
+ ├── detection.pt # YOLOv8 model to detect persons
120
+ └── logs/ # Application logs
121
+ ```
122
+
123
+ ## 🔧 Configuration
124
+
125
+ ### Model Parameters
126
+
127
+ You can adjust the following parameters in `app.py`:
128
+
129
+ ```python
130
+ DETECTION_THRESHOLD = 0.75 # Person detection confidence threshold
131
+ SIMILARITY_THRESHOLD = 0.6 # Person re-identification threshold
132
+ ```
133
+
134
+ ### Performance Optimization
135
+
136
+ **For GPU acceleration:**
137
+ - Ensure CUDA is properly installed
138
+ - The application automatically detects and uses GPU if available
139
+ - Monitor GPU memory usage for large videos
140
+
141
+ **For CPU-only systems:**
142
+ - Reduce video resolution before processing
143
+ - Process shorter video segments
144
+ - Expect longer processing times
145
+
146
+ ## 📊 Output Format
147
+
148
+ ### Processed Video
149
+ - Annotated video with bounding boxes
150
+ - Consistent person IDs across frames
151
+ - Real-time tracking visualization
152
+
153
+ ### JSON Tracking Data
154
+ ```json
155
+ {
156
+ "metadata": {
157
+ "total_frames": 1500,
158
+ "total_people": 5,
159
+ "id_mapping": {...}
160
+ },
161
+ "frames": [
162
+ {
163
+ "frame": 1,
164
+ "people": [
165
+ {
166
+ "person_id": 1,
167
+ "center_x": 320.5,
168
+ "center_y": 240.0,
169
+ "confidence": 0.85,
170
+ "bbox": {"x1": 100, "y1": 50, "x2": 200, "y2": 300}
171
+ }
172
+ ]
173
+ }
174
+ ]
175
+ }
176
+ ```
177
+
178
+ ## 🐛 Troubleshooting
179
+
180
+ ### Common Issues
181
+
182
+ **Installation Problems:**
183
+
184
+ 1. **InsightFace installation fails:**
185
+ ```bash
186
+ # Try installing with specific version
187
+ pip install insightface==0.7.3
188
+ pip install onnxruntime-gpu==1.14.1
189
+ ```
190
+
191
+ If you running linux, you need to install g++. If running on windows, you will need to install latest Visual C++ Redistributions.
192
+
193
+
194
+ 2. **Model download issues:**
195
+ - Check internet connection
196
+ - Manually download models if automatic download fails
197
+ - Ensure sufficient disk space
198
+
199
+ **Runtime Issues:**
200
+
201
+ 1. **Video won't load in browser:**
202
+ - Try downloading the output video manually
203
+ - Check browser compatibility
204
+ - Clear browser cache
205
+
206
+ 2. **Slow processing:**
207
+ - Use GPU acceleration if available
208
+ - Reduce detection threshold
209
+ - Process shorter video segments
210
+
211
+ 3. **High memory usage:**
212
+ - Monitor system resources
213
+ - Close unnecessary applications
214
+ - Use smaller input videos
215
+
216
+ ## 📝 System Requirements
217
+
218
+ - **CPU:** Intel i5 or AMD Ryzen 5 (4 cores)
219
+ - **RAM:** 8GB
220
+ - **Storage:** 5GB free space
221
+ - **GPU:** Optional, but recommended for faster processing
app.py ADDED
@@ -0,0 +1,371 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import warnings
2
+ warnings.filterwarnings("ignore")
3
+
4
+ import gradio as gr
5
+ import cv2
6
+ import numpy as np
7
+ import json
8
+ import os
9
+ from datetime import datetime
10
+ from ultralytics import YOLO
11
+ from insightface.app import FaceAnalysis
12
+ import torchreid
13
+ import torch
14
+ import logging
15
+ import shutil
16
+ import tempfile
17
+ import uuid
18
+
19
+ # ========== Logging Configuration ==========
20
+ logging.basicConfig(
21
+ level=logging.INFO,
22
+ format='[%(asctime)s] [%(levelname)s] %(message)s',
23
+ handlers=[
24
+ logging.FileHandler("app.log"),
25
+ logging.StreamHandler()
26
+ ]
27
+ )
28
+ logger = logging.getLogger(__name__)
29
+
30
+ # ========== Configuration ==========
31
+ DETECTION_THRESHOLD = 0.75
32
+
33
+ # Create output directory for Gradio
34
+ OUTPUT_DIR = os.path.join(os.getcwd(), "outputs")
35
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
36
+
37
+ # ========== Video Processing Class ==========
38
+ class VideoProcessor:
39
+ def __init__(self):
40
+ try:
41
+ self.model = YOLO('detection.pt')
42
+ self.face_app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
43
+ self.face_app.prepare(ctx_id=0)
44
+ self.reid_extractor = torchreid.utils.FeatureExtractor(
45
+ model_name='osnet_x0_25',
46
+ model_path=None,
47
+ device='cuda' if torch.cuda.is_available() else 'cpu'
48
+ )
49
+ self.models_loaded = True
50
+ logger.info("Models loaded successfully.")
51
+ except Exception as e:
52
+ logger.exception("Model loading failed.")
53
+ self.models_loaded = False
54
+ self.reset_tracking()
55
+
56
+ def reset_tracking(self):
57
+ self.known_embeddings = []
58
+ self.known_ids = []
59
+ self.next_global_id = 1
60
+ self.track_to_global = {}
61
+ self.tracking_data = {
62
+ "metadata": {
63
+ "total_frames": 0,
64
+ "total_people": 0,
65
+ "id_mapping": {}
66
+ },
67
+ "frames": []
68
+ }
69
+ logger.info("Tracking state reset.")
70
+
71
+ def extract_embeddings(self, person_crop):
72
+ face_embedding, body_embedding = None, None
73
+ try:
74
+ faces = self.face_app.get(person_crop)
75
+ if faces:
76
+ face_embedding = faces[0].embedding
77
+ except Exception:
78
+ logger.debug("Face embedding failed.")
79
+ try:
80
+ body_input = cv2.resize(person_crop, (128, 256))
81
+ body_input = cv2.cvtColor(body_input, cv2.COLOR_BGR2RGB)
82
+ body_embedding = self.reid_extractor(body_input)[0].cpu().numpy()
83
+ except Exception:
84
+ logger.debug("Body embedding failed.")
85
+
86
+ if face_embedding is not None and body_embedding is not None:
87
+ return np.concatenate((face_embedding, body_embedding)).astype(np.float32)
88
+ elif face_embedding is not None:
89
+ return face_embedding.astype(np.float32)
90
+ elif body_embedding is not None:
91
+ return body_embedding.astype(np.float32)
92
+ return None
93
+
94
+ def assign_global_id(self, embedding, track_id):
95
+ if embedding is None:
96
+ return self.track_to_global.get(track_id, f"T{track_id}")
97
+ match_found = False
98
+ if self.known_embeddings:
99
+ matching_embeddings = [
100
+ (emb, gid) for emb, gid in zip(self.known_embeddings, self.known_ids)
101
+ if emb.shape[0] == embedding.shape[0]
102
+ ]
103
+ if matching_embeddings:
104
+ embs, gids = zip(*matching_embeddings)
105
+ embs = np.array(embs)
106
+ sims = np.dot(embs, embedding) / (
107
+ np.linalg.norm(embs, axis=1) * np.linalg.norm(embedding) + 1e-6
108
+ )
109
+ best_match = np.argmax(sims)
110
+ if sims[best_match] > 0.6:
111
+ global_id = gids[best_match]
112
+ match_found = True
113
+ if not match_found:
114
+ global_id = self.next_global_id
115
+ self.next_global_id += 1
116
+ self.known_embeddings.append(embedding)
117
+ self.known_ids.append(global_id)
118
+ if track_id is not None:
119
+ self.track_to_global[track_id] = global_id
120
+ return global_id
121
+
122
+ def process_video(self, input_video_path, progress_callback=None):
123
+ if not self.models_loaded:
124
+ raise Exception("Models not loaded properly")
125
+
126
+ self.reset_tracking()
127
+
128
+ # Create output files with timestamp
129
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
130
+ unique_id = str(uuid.uuid4())[:8]
131
+
132
+ # Use the OUTPUT_DIR instead of temp directory
133
+ output_video_path = os.path.join(OUTPUT_DIR, f"tracked_video_{timestamp}_{unique_id}.mp4")
134
+ output_json_path = os.path.join(OUTPUT_DIR, f"tracking_data_{timestamp}_{unique_id}.json")
135
+
136
+ cap = cv2.VideoCapture(input_video_path)
137
+ if not cap.isOpened():
138
+ raise Exception("Could not open video file")
139
+
140
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
141
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
142
+ fps = cap.get(cv2.CAP_PROP_FPS)
143
+ total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
144
+
145
+ # Use H.264 codec for better compatibility and add proper video codec
146
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Changed from 'mp4v' to 'H264'
147
+ out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
148
+
149
+ # Verify video writer is properly initialized
150
+ if not out.isOpened():
151
+ logger.warning("H264 codec failed, trying XVID")
152
+ fourcc = cv2.VideoWriter_fourcc(*'XVID')
153
+ output_video_path = output_video_path.replace('.mp4', '.avi')
154
+ out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
155
+
156
+ if not out.isOpened():
157
+ logger.warning("XVID codec failed, trying mp4v")
158
+ fourcc = cv2.VideoWriter_fourcc(*'H264')
159
+ output_video_path = output_video_path.replace('.avi', '.mp4')
160
+ out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
161
+
162
+ frame_count = 0
163
+
164
+ while True:
165
+ ret, frame = cap.read()
166
+ if not ret:
167
+ break
168
+
169
+ frame_count += 1
170
+ if progress_callback:
171
+ progress_callback(frame_count / total_frames, f"Processing frame {frame_count}/{total_frames}")
172
+
173
+ frame_data = {"frame": frame_count, "people": []}
174
+
175
+ try:
176
+ results = self.model.track(
177
+ frame, tracker="bytetrack.yaml", persist=True, verbose=False, conf=DETECTION_THRESHOLD
178
+ )
179
+
180
+ for result in results:
181
+ if result.boxes is not None:
182
+ boxes = result.boxes.xyxy.cpu().numpy()
183
+ confidences = result.boxes.conf.cpu().numpy()
184
+ track_ids = result.boxes.id.int().cpu().tolist() if result.boxes.id is not None else [None] * len(boxes)
185
+
186
+ for box, conf, track_id in zip(boxes, confidences, track_ids):
187
+ x1, y1, x2, y2 = map(int, box)
188
+ person_crop = frame[y1:y2, x1:x2]
189
+ if person_crop.size > 0:
190
+ embedding = self.extract_embeddings(person_crop)
191
+ global_id = self.assign_global_id(embedding, track_id)
192
+
193
+ frame_data["people"].append({
194
+ "person_id": global_id,
195
+ "center_x": (x1 + x2) / 2,
196
+ "center_y": (y1 + y2) / 2,
197
+ "confidence": float(conf),
198
+ "bbox": {"x1": float(x1), "y1": float(y1), "x2": float(x2), "y2": float(y2)}
199
+ })
200
+
201
+ cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
202
+ cv2.putText(frame, f"ID {global_id}", (x1, y1 - 10),
203
+ cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
204
+ except Exception as e:
205
+ logger.exception(f"Error processing frame {frame_count}.")
206
+
207
+ self.tracking_data["frames"].append(frame_data)
208
+ out.write(frame)
209
+
210
+ cap.release()
211
+ out.release()
212
+
213
+ # Verify the output file was created and has content
214
+ if not os.path.exists(output_video_path) or os.path.getsize(output_video_path) == 0:
215
+ raise Exception("Output video file was not created properly")
216
+
217
+ self.tracking_data["metadata"]["total_frames"] = frame_count
218
+ self.tracking_data["metadata"]["total_people"] = len(set(self.known_ids))
219
+ self.tracking_data["metadata"]["id_mapping"] = {str(k): v for k, v in self.track_to_global.items()}
220
+
221
+ # Save JSON file
222
+ with open(output_json_path, 'w') as f:
223
+ json.dump(self.tracking_data, f, indent=2)
224
+
225
+ logger.info(f"Video processing completed. Saved to {output_video_path}")
226
+ logger.info(f"Video file size: {os.path.getsize(output_video_path)} bytes")
227
+
228
+ return output_video_path, output_json_path
229
+
230
+ # ========== Processor ==========
231
+ processor = VideoProcessor()
232
+
233
+ # ========== Gradio Handler ==========
234
+ def process_video_gradio(input_video, progress=gr.Progress()):
235
+ if input_video is None:
236
+ return None, None, "Please upload a video file."
237
+
238
+ try:
239
+ def progress_callback(prog, message):
240
+ progress(prog, desc=message)
241
+
242
+ # Process video
243
+ output_video_path, output_json_path = processor.process_video(input_video, progress_callback)
244
+
245
+ # Verify files exist and are accessible
246
+ if not os.path.exists(output_video_path):
247
+ raise Exception(f"Output video not found at {output_video_path}")
248
+ if not os.path.exists(output_json_path):
249
+ raise Exception(f"Output JSON not found at {output_json_path}")
250
+
251
+ # Read tracking data for stats
252
+ with open(output_json_path, 'r') as f:
253
+ data = json.load(f)
254
+
255
+ stats = f"""
256
+ **Processing Complete!** ✅
257
+
258
+ - **Total Frames Processed:** {data['metadata']['total_frames']}
259
+ - **Total People Detected:** {data['metadata']['total_people']}
260
+ - **Unique IDs Assigned:** {len(data['metadata']['id_mapping'])}
261
+ - **Output Video Size:** {os.path.getsize(output_video_path) / (1024*1024):.1f} MB
262
+
263
+ 📹 **Output video** is ready for download
264
+ 📄 **JSON tracking data** contains frame-by-frame detection results
265
+ """
266
+
267
+ logger.info(f"Returning video path: {output_video_path}")
268
+ logger.info(f"Video exists: {os.path.exists(output_video_path)}")
269
+
270
+ return output_video_path, output_json_path, stats
271
+
272
+ except Exception as e:
273
+ logger.exception("Video processing failed.")
274
+ return None, None, f"❌ **Error processing video:** {str(e)}"
275
+
276
+ # ========== Gradio Interface ==========
277
+ def create_interface():
278
+ with gr.Blocks(title="Video Person Detection & Tracking", theme=gr.themes.Soft()) as demo:
279
+ gr.Markdown("# 🎥 Video Person Detection & Tracking with ReID")
280
+ gr.Markdown("Upload a video to detect and track people using YOLOv8, InsightFace, and ReID models for consistent person identification across frames.")
281
+
282
+ with gr.Row():
283
+ with gr.Column(scale=1):
284
+ input_video = gr.Video(
285
+ label="📂 Upload Input Video",
286
+ height=400,
287
+ interactive=True
288
+ )
289
+ process_btn = gr.Button(
290
+ "🚀 Process Video",
291
+ variant="primary",
292
+ size="lg"
293
+ )
294
+
295
+ with gr.Column(scale=1):
296
+ output_video = gr.Video(
297
+ label="🎬 Processed Video (with tracking)",
298
+ height=400,
299
+ interactive=False,
300
+ show_download_button=True # Enable download button
301
+ )
302
+ download_json = gr.File(
303
+ label="📊 Download Tracking Data (JSON)",
304
+ interactive=False
305
+ )
306
+
307
+ with gr.Row():
308
+ status_text = gr.Markdown("📤 Upload a video and click **'Process Video'** to start tracking people.")
309
+
310
+ # Event handler
311
+ process_btn.click(
312
+ fn=process_video_gradio,
313
+ inputs=[input_video],
314
+ outputs=[output_video, download_json, status_text],
315
+ show_progress=True
316
+ )
317
+
318
+ # Additional information
319
+ with gr.Accordion("📖 How it works", open=False):
320
+ gr.Markdown("""
321
+ ### 🔧 **Technology Stack:**
322
+ - **YOLOv8:** Real-time person detection
323
+ - **ByteTrack:** Multi-object tracking algorithm
324
+ - **InsightFace:** Facial feature extraction for person identification
325
+ - **OSNet:** Full-body re-identification features
326
+
327
+ ### 📋 **Process:**
328
+ 1. **Detection:** YOLOv8 detects people in each frame
329
+ 2. **Tracking:** ByteTrack assigns temporary tracking IDs
330
+ 3. **Feature Extraction:** InsightFace + OSNet extract identifying features
331
+ 4. **Re-identification:** Combines face and body features for consistent global IDs
332
+ 5. **Output:** Generates annotated video + detailed JSON tracking data
333
+
334
+ ### 📁 **Supported Formats:**
335
+ - **Input:** MP4, AVI, MOV, WEBM
336
+ - **Output:** MP4 video + JSON metadata
337
+ """)
338
+
339
+ with gr.Accordion("⚙️ Model Configuration", open=False):
340
+ gr.Markdown(f"""
341
+ - **Detection Threshold:** {DETECTION_THRESHOLD}
342
+ - **Similarity Threshold:** 0.6 (for person re-identification)
343
+ - **Device:** {"CUDA" if torch.cuda.is_available() else "CPU"}
344
+ - **Output Directory:** {OUTPUT_DIR}
345
+ """)
346
+
347
+ with gr.Accordion("🔧 Troubleshooting", open=False):
348
+ gr.Markdown("""
349
+ **If video doesn't display:**
350
+ 1. Check if the output file exists in the outputs directory
351
+ 2. Try downloading the video manually
352
+ 3. Ensure proper video codec support
353
+
354
+ **Common issues:**
355
+ - Large video files may take time to load
356
+ - Some browsers may not support certain video formats
357
+ - Network issues can affect video streaming
358
+ """)
359
+
360
+ return demo
361
+
362
+ # ========== Launch ==========
363
+ if __name__ == "__main__":
364
+ demo = create_interface()
365
+ # Add file serving for outputs directory
366
+ demo.launch(
367
+ share=False,
368
+ server_name="127.0.0.1",
369
+ server_port=7860,
370
+ show_error=True
371
+ )
detection.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04f78656185b52201e8bb37ac0990901ccbfcb4b1455c3f514ea18adc702672c
3
+ size 40485178
detection.py ADDED
@@ -0,0 +1,215 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ from django import conf
3
+ import numpy as np
4
+ from ultralytics import YOLO
5
+ from insightface.app import FaceAnalysis
6
+ import torchreid
7
+ import torch
8
+
9
+ # Configuration
10
+ DETECTION_THRESHOLD = 0.75 # Confidence threshold for person detection
11
+
12
+ # =============================================================================
13
+ # MODEL INITIALIZATION
14
+ # =============================================================================
15
+
16
+ # Load YOLOv8 model with ByteTrack tracker for person detection and tracking
17
+ # YOLOv8 handles object detection while ByteTrack provides consistent tracking IDs
18
+ model = YOLO(r'detection.pt') # Replace with your trained model path
19
+
20
+ # Initialize InsightFace for facial feature extraction
21
+ # Uses buffalo_l model which provides high-quality face embeddings
22
+ face_app = FaceAnalysis(name='buffalo_l', providers=['CUDAExecutionProvider'])
23
+ face_app.prepare(ctx_id=0) # Prepare for GPU inference
24
+
25
+ # Initialize TorchReID for full-body person re-identification
26
+ # OSNet is a lightweight but effective model for person ReID
27
+ reid_extractor = torchreid.utils.FeatureExtractor(
28
+ model_name='osnet_x0_25',
29
+ model_path='osnet_x0_25_market1501.pth', # Pre-trained on Market1501 dataset
30
+ device='cuda' if torch.cuda.is_available() else 'cpu'
31
+ )
32
+
33
+ # =============================================================================
34
+ # GLOBAL VARIABLES FOR PERSON RE-IDENTIFICATION
35
+ # =============================================================================
36
+
37
+ # Storage for known person embeddings and their assigned global IDs
38
+ known_embeddings = [] # List of combined face+body embeddings
39
+ known_ids = [] # Corresponding global IDs for each embedding
40
+ next_global_id = 1 # Counter for assigning new global IDs
41
+
42
+ # Mapping from ByteTrack tracker IDs to global person IDs
43
+ # This helps maintain consistency when tracker IDs change
44
+ track_to_global = {}
45
+
46
+ # =============================================================================
47
+ # VIDEO INPUT/OUTPUT SETUP
48
+ # =============================================================================
49
+
50
+ # Initialize video capture and output writer
51
+ cap = cv2.VideoCapture("demo.mp4") # Input video file
52
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
53
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
54
+ fps = cap.get(cv2.CAP_PROP_FPS)
55
+
56
+ # Create output video writer with same properties as input
57
+ out = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (width, height))
58
+
59
+ # =============================================================================
60
+ # MAIN PROCESSING LOOP
61
+ # =============================================================================
62
+
63
+ while True:
64
+ ret, frame = cap.read()
65
+ if not ret:
66
+ break # End of video
67
+
68
+ # Run YOLOv8 detection with ByteTrack tracking
69
+ # persist=True maintains tracking across frames
70
+ results = model.track(frame, tracker="bytetrack.yaml", persist=True,
71
+ verbose=False, conf=DETECTION_THRESHOLD)
72
+
73
+ # Process each detection result
74
+ for result in results:
75
+ # Extract bounding boxes in (x1, y1, x2, y2) format
76
+ boxes = result.boxes.xyxy.cpu().numpy()
77
+
78
+ # Extract tracking IDs if available
79
+ if result.boxes.id is not None:
80
+ track_ids = result.boxes.id.int().cpu().tolist()
81
+ else:
82
+ # No tracking IDs available, assign None for each detection
83
+ track_ids = [None] * len(boxes)
84
+
85
+ # Process each detected person
86
+ for box, track_id in zip(boxes, track_ids):
87
+ x1, y1, x2, y2 = map(int, box)
88
+
89
+ # Crop the person from the frame
90
+ person_crop = frame[y1:y2, x1:x2]
91
+
92
+ # Initialize embedding variables
93
+ face_embedding = None
94
+ body_embedding = None
95
+
96
+ # =============================================================
97
+ # FACE EMBEDDING EXTRACTION
98
+ # =============================================================
99
+
100
+ # Extract face embedding using InsightFace
101
+ faces = face_app.get(person_crop)
102
+ if faces:
103
+ # Use the first detected face (most confident)
104
+ face_embedding = faces[0].embedding
105
+
106
+ # =============================================================
107
+ # BODY EMBEDDING EXTRACTION
108
+ # =============================================================
109
+
110
+ # Extract body embedding using TorchReID
111
+ try:
112
+ # TorchReID expects 128x256 RGB input
113
+ body_input = cv2.resize(person_crop, (128, 256))
114
+ body_input = cv2.cvtColor(body_input, cv2.COLOR_BGR2RGB)
115
+
116
+ # Extract features and convert to numpy
117
+ body_embedding = reid_extractor(body_input)[0].cpu().numpy()
118
+ except:
119
+ # Handle cases where crop is too small or invalid
120
+ pass
121
+
122
+ # =============================================================
123
+ # EMBEDDING COMBINATION AND PERSON MATCHING
124
+ # =============================================================
125
+
126
+ # Combine face and body embeddings for robust person representation
127
+ embedding = None
128
+ if face_embedding is not None and body_embedding is not None:
129
+ # Concatenate both embeddings for maximum distinctiveness
130
+ embedding = np.concatenate((face_embedding, body_embedding)).astype(np.float32)
131
+ elif face_embedding is not None:
132
+ # Use only face embedding if body embedding failed
133
+ embedding = face_embedding.astype(np.float32)
134
+ elif body_embedding is not None:
135
+ # Use only body embedding if face detection failed
136
+ embedding = body_embedding.astype(np.float32)
137
+
138
+ # Assign global ID based on embedding similarity
139
+ if embedding is not None:
140
+ match_found = False
141
+
142
+ # Search for similar embeddings among known people
143
+ if known_embeddings:
144
+ # Only compare embeddings of the same dimension
145
+ matching_embeddings = [
146
+ (emb, gid) for emb, gid in zip(known_embeddings, known_ids)
147
+ if emb.shape[0] == embedding.shape[0]
148
+ ]
149
+
150
+ if matching_embeddings:
151
+ embs, gids = zip(*matching_embeddings)
152
+ embs = np.array(embs)
153
+
154
+ # Calculate cosine similarity with all known embeddings
155
+ sims = np.dot(embs, embedding) / (
156
+ np.linalg.norm(embs, axis=1) * np.linalg.norm(embedding) + 1e-6
157
+ )
158
+
159
+ # Find the best match
160
+ best_match = np.argmax(sims)
161
+ if sims[best_match] > 0.6: # Similarity threshold
162
+ global_id = gids[best_match]
163
+ match_found = True
164
+
165
+ # If no match found, assign new global ID
166
+ if not match_found:
167
+ global_id = next_global_id
168
+ next_global_id += 1
169
+ known_embeddings.append(embedding)
170
+ known_ids.append(global_id)
171
+
172
+ # Update tracker ID to global ID mapping
173
+ if track_id is not None:
174
+ track_to_global[track_id] = global_id
175
+
176
+ display_id = global_id
177
+
178
+ else:
179
+ # No usable embedding available, fallback to tracker ID
180
+ global_id = track_to_global.get(track_id, f"T{track_id}")
181
+ display_id = global_id
182
+
183
+ # =============================================================
184
+ # VISUALIZATION
185
+ # =============================================================
186
+
187
+ # Draw bounding box around detected person
188
+ cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
189
+
190
+ # Display the global ID above the bounding box
191
+ cv2.putText(frame, f"ID {display_id}", (x1, y1 - 10),
192
+ cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
193
+
194
+ # =============================================================================
195
+ # OUTPUT AND DISPLAY
196
+ # =============================================================================
197
+
198
+ # Show the frame with tracking results
199
+ cv2.imshow("Tracking + ReID", frame)
200
+
201
+ # Break loop if 'q' key is pressed
202
+ if cv2.waitKey(1) & 0xFF == ord('q'):
203
+ break
204
+
205
+ # Write frame to output video
206
+ out.write(frame)
207
+
208
+ # =============================================================================
209
+ # CLEANUP
210
+ # =============================================================================
211
+
212
+ # Release video capture and writer resources
213
+ cap.release()
214
+ out.release()
215
+ cv2.destroyAllWindows()
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --extra-index-url https://download.pytorch.org/whl/cu118
2
+
3
+ torch==2.4.1
4
+ torchvision==0.19.1
5
+ torchaudio==2.4.1
6
+ gradio==5.35.0
7
+ insightface==0.7.3
8
+ onnxruntime-gpu==1.14.1
9
+ torchreid==0.2.5
10
+ ultralytics==8.3.161
11
+ gdown==5.2.0
12
+ gradio==5.35.0
13
+ lap==0.5.12
14
+ tensorboard==2.19.0