Post
330
MiMo-VL š„ smol & mighty vision language model by Xiaomi
XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
⨠7B with RL & SFT
⨠Native resolution ViT for fine grained perception
⨠MORL = smarter alignment across perception, grounding & reasoning
XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
⨠7B with RL & SFT
⨠Native resolution ViT for fine grained perception
⨠MORL = smarter alignment across perception, grounding & reasoning