microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ 6B β’ Updated May 1 β’ 437k β’ 1.47k
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 146