what is the benefit of text only model compared to text + vision ?

#1
by stabgan - opened

PLease explain

Mostly for tool compatibility. If the multimodal model is working for you, this probably doesn't offer you anything now.

I made these myself so I could:

  • Train control-vectors for Gemma3

  • Create / run EXL2 quants in a modified exllamav2. I managed to get text working but vision was beyond me.
    (no longer needed as exllamav2 supports the gemma3 properly on dev branch)

  • Abliteration tools weren't working with vision
    (these are probably updated by now as I see abliterated multimodal models on HF)

  • I was having issues finetuning the multimodal models just after the model release.
    (No longer needed as unsloth supports multimodal now)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment