Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 6 items • Updated about 23 hours ago • 28
Eagle 2 Collection Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated about 23 hours ago • 33
Nemotron-H Collection Mamba-Transformer hybrid models • 5 items • Updated about 23 hours ago • 21
Gemma 3 Collection All versions of Google's new multimodal models in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 29 items • Updated about 8 hours ago • 53
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory. • 19 items • Updated 6 days ago • 25
Jina Reranker v2 Collection A collection of state-of-the-art multilingual neural rerankers • 1 item • Updated Sep 17, 2024 • 9
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Paper • 2504.09454 • Published 11 days ago • 12
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 6 items • Updated 6 days ago • 28
OpenCodeReasoning Collection Reasoning data for supervised finetuning of LLMs to advance data distillation for competitive coding • 5 items • Updated about 23 hours ago • 7
Llama Nemotron Collection Open, Production-ready Enterprise Models • 4 items • Updated about 23 hours ago • 37
Granite 3.2 Models (GGUF) Collection GGUF-formatted versions of IBM Granite 3.2 models. Licensed under the Apache 2.0 license. • 5 items • Updated Mar 21 • 4
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 6 days ago • 161
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 4 items • Updated about 5 hours ago • 91
Cohere Labs Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 9 days ago • 68