SmolVLM-256M-Detection
experimental and for learning purposes; wouldn't recommend using unless
check out github/shreydan/VLM-OD for results and details.
Usage
- load the model same as
HuggingFaceTB/SmolVLM-256M-Instruct
- inputs:
detect car
/detect person;car
etc. Apply chat template withadd_generation_prompt=True
- parse the output tokens
<loc000>
to<loc255>
(code ineval.ipynb
of my github repo) - to reiterate I have not added any
<locXXX>
special tokens (that needs wayyy more training than this method), the model itself is generating them.
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for shreydan/SmolVLM-256M-Detection
Base model
HuggingFaceTB/SmolLM2-135M
Quantized
HuggingFaceTB/SmolLM2-135M-Instruct
Quantized
HuggingFaceTB/SmolVLM-256M-Instruct