Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

303

Full-text search

Active filters: 4bit

NeoChen1024/internlm2_5-20b-chat-exl2-4.25bpw-h8

Text Generation • Updated Sep 30, 2024 • 2

ussipan/SipanGPT-0.1-Llama-3.2-1B-GGUF

Text Generation • 1B • Updated Oct 19, 2024 • 23 • 1

ussipan/SipanGPT-0.2-Llama-3.2-1B-GGUF

Text Generation • 1B • Updated Oct 19, 2024 • 25

mcavus/glm-4v-9b-gptq-4bit-dynamo

3B • Updated Oct 10, 2024 • 4 • 1

ussipan/SipanGPT-0.3-Llama-3.2-1B-GGUF

Text Generation • 1B • Updated Dec 23, 2024 • 32 • 1

harishnair04/Gemma-medtr-2b-sft

Text Generation • 2B • Updated Nov 7, 2024 • 2

harishnair04/Gemma-medtr-2b-sft-v2

Text Generation • 3B • Updated Nov 15, 2024 • 2

mradermacher/Gemma-medtr-2b-sft-v2-GGUF

3B • Updated Nov 16, 2024 • 18

NaomiBTW/L3-8B-Lunaris-v1-GPTQ

Text Generation • Updated Nov 11, 2024

ModelCloud/Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-v1

Text Generation • 7B • Updated Nov 14, 2024 • 17 • 15

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v1

Text Generation • 7B • Updated Dec 18, 2024 • 8 • 51

nisten/qwen2.5-coder-7b-abliterated-128k-AWQ

Text Generation • 2B • Updated Jan 7 • 21

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v2

Text Generation • 7B • Updated Dec 18, 2024 • 4 • 16

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v3

Text Generation • 7B • Updated Dec 20, 2024 • 6 • 14

mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit

Text Generation • 1B • Updated Dec 20, 2024 • 2

ModelCloud/Falcon3-10B-Instruct-gptqmodel-4bit-vortex-v1

Text Generation • 2B • Updated Dec 21, 2024 • 4 • 3

adriabama06/SmallThinker-3B-Preview-AWQ

Text Generation • Updated Jan 3 • 2 • 1

exxocism/Linkbricks-Horizon-AI-Llama-3.3-Korean-70B-sft-dpo-GGUF

Text Generation • Updated Jan 7

ehristoforu/Phi4-MoE-2x14B-Instruct

Text Generation • 14B • Updated Jan 9 • 5

ModelCloud/Qwen2.5-0.5B-Instruct-gptqmodel-4bit

Text Generation • 0.3B • Updated May 13 • 16 • 1

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v1

Text Generation • 2B • Updated Jan 24 • 6 • 5

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v2

Text Generation • 2B • Updated Jan 24 • 1.02k • 7

vital-ai/watt-tool-70B-awq

11B • Updated Jan 24 • 9.35k • 4

curiousmind147/microsoft-phi-4-AWQ-4bit-GEMM

Text Generation • 3B • Updated Feb 4 • 10.5k • 1

ConfidentialMind/Mistral-Small-24B-Instruct-2501_GPTQ_G128_W4A16_MSE

Text Classification • 4B • Updated Feb 18 • 66 • 1

ConfidentialMind/Virtuoso-Medium-v2_GPTQ_G128_W4A16

Text Generation • 6B • Updated Feb 16 • 2

ConfidentialMind/Virtuoso-Medium-v2_GPTQ_G32_W4A16

Text Generation • 7B • Updated Feb 16 • 2

ConfidentialMind/Mistral-Small-24B-Instruct-2501_GPTQ_G32_W4A16

Text Generation • 5B • Updated Feb 23 • 3 • 1

ConfidentialMind/Rombos-LLM-V2.6-Qwen-14b_GPTQ_G32_4bit_MSE

Text Generation • 4B • Updated Feb 24 • 3

ConfidentialMind/Arcee-Blitz-GPTQ-G32-W4A16-MSE

Text Generation • 5B • Updated Feb 26 • 2