Excellent SLM & SVLM - a CocoSun Collection

13.1k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

36

Open LMM Reasoning Leaderboard

🥇

A Leaderboard that demonstrates LMM reasoning capabilities

Qwen/Qwen2.5-0.5B-Instruct

Text Generation • Updated Sep 25, 2024 • 1.05M • 320

Note 0.5B Size. Qwen2.5 Technical Report, https://huggingface.co/papers/2412.15115

Qwen/Qwen2.5-1.5B-Instruct

Text Generation • Updated Sep 25, 2024 • 1.26M • • 437

Note 1.5B Size. Qwen2.5 Technical Report, https://huggingface.co/papers/2412.15115

Note EXAONE Deep Released ━ Setting a New Standard for Reasoning AI, https://www.lgresearch.ai/news/view?seq=543

google/gemma-3-1b-it

Text Generation • Updated Apr 4 • 2.13M • 430

Note Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM, https://huggingface.co/blog/gemma3 Gemma 3 Technical Report, https://huggingface.co/papers/2503.19786

microsoft/Phi-4-mini-instruct

Text Generation • Updated 26 days ago • 413k • 482

Note Empowering innovation: The next generation of the Phi family,https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/ Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs, https://huggingface.co/papers/2503.01743

HuggingFaceTB/SmolLM2-135M-Instruct

Text Generation • Updated Apr 21 • 309k • 196

Note 135M Size. SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model, https://huggingface.co/papers/2502.02737

HuggingFaceTB/SmolLM2-360M-Instruct

Text Generation • Updated 12 days ago • 130k • 123

Note 360M Size. SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model, https://huggingface.co/papers/2502.02737

HuggingFaceTB/SmolLM2-1.7B-Instruct

Text Generation • Updated Apr 21 • 47.1k • 630

Note 1.7B Size. SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model, https://huggingface.co/papers/2502.02737

ibm-granite/granite-3.2-2b-instruct

Text Generation • Updated Apr 17 • 53.7k • 47

Note Granite-3.2-2B-Instruct is an 2-billion-parameter, long-context AI model fine-tuned for thinking capabilities. Tutorials: https://www.ibm.com/granite/docs/ Cookbook: https://github.com/ibm-granite-community/granite-snack-cookbook/tree/main

google/txgemma-2b-predict

Text Generation • Updated Apr 10 • 7.75k • 36

Note TxGemma: Efficient and Agentic LLMs for Therapeutics paper: https://storage.googleapis.com/research-media/txgemma/txgemma-report.pdf GitHub repository (supporting code, Colab notebooks, discussions, and issues): https://github.com/google-gemini/gemma-cookbook/tree/main/TxGemma

773

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection

ds4sd/SmolDocling-256M-preview

Image-Text-to-Text • Updated 11 days ago • 281k • 1.4k

Note SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion, https://huggingface.co/papers/2503.11576

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated 26 days ago • 391k • 1.4k

Note Empowering innovation: The next generation of the Phi family,https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/ Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs, https://huggingface.co/papers/2503.01743

google/gemma-3-4b-it

Image-Text-to-Text • Updated Mar 21 • 837k • 560

Note Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM, https://huggingface.co/blog/gemma3 Gemma 3 Technical Report, https://huggingface.co/papers/2503.19786

google/shieldgemma-2-4b-it

Image-Text-to-Text • Updated Apr 4 • 27k • 99

Note Safer and Multimodal: Responsible AI with Gemma, https://developers.googleblog.com/en/safer-and-multimodal-responsible-ai-with-gemma/

HuggingFaceTB/SmolVLM-256M-Instruct

Image-Text-to-Text • Updated Apr 8 • 301k • 235

Note 256M Size. SmolVLM - small yet mighty Vision Language Model, https://huggingface.co/blog/smolvlm SmolVLM Grows Smaller – Introducing the 250M & 500M Models, https://huggingface.co/blog/smolervlm

HuggingFaceTB/SmolVLM-500M-Instruct

Image-Text-to-Text • Updated Apr 8 • 38.7k • 148

Note 500M Size. SmolVLM - small yet mighty Vision Language Model, https://huggingface.co/blog/smolvlm SmolVLM Grows Smaller – Introducing the 250M & 500M Models, https://huggingface.co/blog/smolervlm

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated Apr 8 • 83.7k • 469

Note 2.2B SIze. SmolVLM - small yet mighty Vision Language Model, https://huggingface.co/blog/smolvlm

HuggingFaceTB/SmolVLM2-256M-Video-Instruct

Image-Text-to-Text • Updated Apr 8 • 23.1k • 61

Note 256M Size. SmolVLM2: Bringing Video Understanding to Every Device, https://huggingface.co/blog/smolvlm2

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

Image-Text-to-Text • Updated Apr 8 • 40.3k • 62

Note 500M Size. SmolVLM2: Bringing Video Understanding to Every Device, https://huggingface.co/blog/smolvlm2

HuggingFaceTB/SmolVLM2-2.2B-Instruct

Image-Text-to-Text • Updated Apr 8 • 80.3k • 192

Note 2.2B Size. SmolVLM2: Bringing Video Understanding to Every Device, https://huggingface.co/blog/smolvlm2

Qwen/Qwen2.5-VL-3B-Instruct

Image-Text-to-Text • Updated Apr 6 • 2.73M • 380

Note Qwen2.5-VL Technical Report, https://huggingface.co/papers/2502.13923 Grounding. https://qwenlm.github.io/blog/qwen2.5-vl/

ibm-granite/granite-vision-3.2-2b

Image-Text-to-Text • Updated Apr 14 • 5.7k • 90

Note Granite-vision-3.2-2b is a compact and efficient vision-language model, specifically designed for visual document understanding. Tutorials: https://www.ibm.com/granite/docs/models/vision/ Paper: https://arxiv.org/abs/2502.09927

nvidia/GR00T-N1-2B

Robotics • Updated Mar 18 • 5.53k • 310

Note NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills. https://github.com/NVIDIA/Isaac-GR00T/

vikhyatk/moondream2

Image-Text-to-Text • Updated Apr 14 • 382k • 1.14k

Note Moondream is a small vision language model designed to run efficiently on edge devices. Blog: https://moondream.ai/blog/moondream-2025-03-27-release Github: https://github.com/vikhyat/moondream