Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
merveΒ 
posted an update Mar 13
Post
I love vision language models πŸ’—
My favorite is KOSMOS-2, because it's a grounded model (it doesn't hallucinate).
In this demo you can,
- ask a question about the image,
- do detailed/brief captioning,
- localize the objects! 🀯
It's just amazing for VLM to return bounding boxes 🀩
Try it here merve/kosmos2
In this post