Predicting the Order of Upcoming Tokens Improves Language Modeling Paper • 2508.19228 • Published 9 days ago • 20
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech • Apr 16 • 38
Indonesian Text Similarity Dataset Collection This collection contains currated text similarity datasets that are available in huggingface dataset • 16 items • Updated Jul 11 • 5
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL By toslali-ibm and 5 others • Jun 3 • 84
Gemma 3n Collection Google Gemma 3n models, all versions including Dynamic GGUF, 4-bit, 16-bit and formats! • 10 items • Updated 14 days ago • 21
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs Paper • 2504.17768 • Published Apr 24 • 14
Mistral Small 3 (All Versions) Collection A collection of Mistral's new Small 3.2 and 3 models including GGUF, 4-bit and more! • 20 items • Updated 14 days ago • 15
DeepSeek R1 (All Versions) Collection DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 14 days ago • 257
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 284
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 99
Phi-4 (All Versions) Collection Microsoft's Phi-4 models including Reasoning + Reasoning Plus & mini. Includes Dynamic 2.0 GGUF, 4-bit & 16-bit versions. Includes Unsloth's bug fixes • 20 items • Updated 14 days ago • 74
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 53
Load 4bit models 4x faster Collection Native bitsandbytes 4bit pre quantized models • 25 items • Updated 14 days ago • 57
Qwen 2.5 Coder Collection Complete collection of Code-specific model series for Qwen2.5 in bnb 4bit, 16bit and GGUF formats. • 35 items • Updated 14 days ago • 35
Llama 3.2 Vision Collection Meta's Llama 3.2 vision models 11B and 90B. Include 4-bit bnb and original versions. • 8 items • Updated 14 days ago • 7
Llama 3.2 Collection Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 27 items • Updated 14 days ago • 68