SmolVLM-256M-Detection

experimental and for learning purposes; wouldn't recommend using unless

check out github/shreydan/VLM-OD for results and details.

Usage

  • load the model same as HuggingFaceTB/SmolVLM-256M-Instruct
  • inputs: detect car / detect person;car etc. Apply chat template with add_generation_prompt=True
  • parse the output tokens <loc000> to <loc255> (code in eval.ipynb of my github repo)
  • to reiterate I have not added any <locXXX> special tokens (that needs wayyy more training than this method), the model itself is generating them.

outputs

Downloads last month
19
Safetensors
Model size
256M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shreydan/SmolVLM-256M-Detection

Finetuned
(20)
this model
Quantizations
2 models