Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! π€― Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€ 1οΈβ£ Generate and cache image features for each frame 2οΈβ£ Create a list of embeddings for selected patch(es) 3οΈβ£ Compute cosine similarity between each patch and the selected patch(es) 4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.