what is the benefit of text only model compared to text + vision ?
#1
by
stabgan
- opened
PLease explain
Mostly for tool compatibility. If the multimodal model is working for you, this probably doesn't offer you anything now.
I made these myself so I could:
Train control-vectors for Gemma3
Create / run EXL2 quants in a modified exllamav2. I managed to get text working but vision was beyond me.
(no longer needed as exllamav2 supports the gemma3 properly on dev branch)Abliteration tools weren't working with vision
(these are probably updated by now as I see abliterated multimodal models on HF)I was having issues finetuning the multimodal models just after the model release.
(No longer needed as unsloth supports multimodal now)