Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,8 @@ library_name: transformers
|
|
14 |
# Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1
|
15 |
|
16 |
|
|
|
|
|
17 |
## Introduction
|
18 |
Qwen2.5-VL was not originally trained for object tracking tasks. While it can perform object detection on individual frames or across video inputs, processing N frames sequentially results in identical predictions for each frame. Consequently, the model cannot maintain consistent object IDs across predictions.
|
19 |
We provide a LoRA adapter for Qwen2.5-VL-3B that enables object tracking capabilities.
|
@@ -117,7 +119,7 @@ device = "cuda"
|
|
117 |
objects_for_tracking = "person" ## "person, cat", "person, cat, dog"
|
118 |
|
119 |
## Load video and convert to numpy array of shape (num_frames, height, width, channels)
|
120 |
-
video, fps = read_video(video_path="
|
121 |
```
|
122 |
|
123 |
### Run inference
|
|
|
14 |
# Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1
|
15 |
|
16 |
|
17 |
+
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/cPo3S-tuu3UgV9_aIOhU1.mp4"></video>
|
18 |
+
|
19 |
## Introduction
|
20 |
Qwen2.5-VL was not originally trained for object tracking tasks. While it can perform object detection on individual frames or across video inputs, processing N frames sequentially results in identical predictions for each frame. Consequently, the model cannot maintain consistent object IDs across predictions.
|
21 |
We provide a LoRA adapter for Qwen2.5-VL-3B that enables object tracking capabilities.
|
|
|
119 |
objects_for_tracking = "person" ## "person, cat", "person, cat, dog"
|
120 |
|
121 |
## Load video and convert to numpy array of shape (num_frames, height, width, channels)
|
122 |
+
video, fps = read_video(video_path="path to video.mp4", start_frame=0, frames_count=16, max_side=896)
|
123 |
```
|
124 |
|
125 |
### Run inference
|