Post
1894
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! π€―
Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€
1οΈβ£ Generate and cache image features for each frame
2οΈβ£ Create a list of embeddings for selected patch(es)
3οΈβ£ Compute cosine similarity between each patch and the selected patch(es)
4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Excited to see what the community builds with it!
Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€
1οΈβ£ Generate and cache image features for each frame
2οΈβ£ Create a list of embeddings for selected patch(es)
3οΈβ£ Compute cosine similarity between each patch and the selected patch(es)
4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Excited to see what the community builds with it!