Post
I love vision language models π
My favorite is KOSMOS-2, because it's a grounded model (it doesn't hallucinate).
In this demo you can,
- ask a question about the image,
- do detailed/brief captioning,
- localize the objects! π€―
It's just amazing for VLM to return bounding boxes π€©
Try it here merve/kosmos2
My favorite is KOSMOS-2, because it's a grounded model (it doesn't hallucinate).
In this demo you can,
- ask a question about the image,
- do detailed/brief captioning,
- localize the objects! π€―
It's just amazing for VLM to return bounding boxes π€©
Try it here merve/kosmos2