TheDenk
/

Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1

Image-Text-to-Text

Model card Files Files and versions Community

TheDenk commited on Apr 26

Commit

a285a9b

·

verified ·

1 Parent(s): dc550b0

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -14,6 +14,8 @@ library_name: transformers
 # Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1
 ## Introduction
 Qwen2.5-VL was not originally trained for object tracking tasks. While it can perform object detection on individual frames or across video inputs, processing N frames sequentially results in identical predictions for each frame. Consequently, the model cannot maintain consistent object IDs across predictions.
 We provide a LoRA adapter for Qwen2.5-VL-3B that enables object tracking capabilities.
@@ -117,7 +119,7 @@ device = "cuda"
 objects_for_tracking = "person"  ## "person, cat", "person, cat, dog"
 ## Load video and convert to numpy array of shape (num_frames, height, width, channels)
-video, fps = read_video(video_path="bear.mp4", start_frame=0, frames_count=16, max_side=896)
 ```
 ### Run inference

 # Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1
+<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/cPo3S-tuu3UgV9_aIOhU1.mp4"></video>
 ## Introduction
 Qwen2.5-VL was not originally trained for object tracking tasks. While it can perform object detection on individual frames or across video inputs, processing N frames sequentially results in identical predictions for each frame. Consequently, the model cannot maintain consistent object IDs across predictions.
 We provide a LoRA adapter for Qwen2.5-VL-3B that enables object tracking capabilities.
 objects_for_tracking = "person"  ## "person, cat", "person, cat, dog"
 ## Load video and convert to numpy array of shape (num_frames, height, width, channels)
+video, fps = read_video(video_path="path to video.mp4", start_frame=0, frames_count=16, max_side=896)
 ```
 ### Run inference