Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

2,464

Full-text search

Active filters: quantized

crazymaker1122/Llama-3.3-8B-Instruct-awq-direct

Text Generation • 2B • Updated Jul 9 • 19

steampunque/Llama-3.1-8B-Instruct-Hybrid-GGUF

8B • Updated Jul 9 • 7

steampunque/Llama-3.3-70B-Instruct-Hybrid-GGUF

71B • Updated Jul 11 • 30

steampunque/ultravox-v0_5-llama-3_3-70b-Hybrid-GGUF

0.7B • Updated Jul 10 • 13

steampunque/ultravox-v0_6-llama-3_3-70b-Hybrid-GGUF

0.7B • Updated Jul 10 • 13

magicunicorn/gemma-3-27b-npu-quantized

Text Generation • Updated Jul 10

rs-test/llama-scout-fp8

Image-Text-to-Text • 109B • Updated Jul 11 • 300

Makatia/mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf

7B • Updated about 1 month ago • 34

Makatia/microsoft_Phi-3-mini-4k-instruct_onnx_rpi

Updated about 1 month ago

Makatia/TinyLlama_TinyLlama-1.1B-Chat-v1.0_onnx

Updated about 1 month ago • 1

JonathanMiddleton/Qwen3-Reranker-4B-GGUF

Text Ranking • 4B • Updated 30 days ago • 147

ramblingpolymath/Qwen3-32B-W8A8

Text Generation • 33B • Updated 10 days ago • 242

steampunque/Deepseek-R1-Distill-Llama-8B-Hybrid-GGUF

8B • Updated 29 days ago • 27

ramblingpolymath/Qwen3-14B-W8A8

Text Generation • 15B • Updated 10 days ago • 23

ramblingpolymath/Qwen3-8B-W8A8

Text Generation • 8B • Updated 10 days ago • 7

ramblingpolymath/Qwen3-4B-W8A8

Text Generation • 4B • Updated 10 days ago • 11

RedHatAI/Kimi-K2-Instruct-quantized.w4a16

Text Generation • Updated 26 days ago • 8.12k • 8

adamrb/mpt-30b-chat-w4a16-gptq

4B • Updated 28 days ago • 9

adamrb/mpt-30b-chat-w8a8-gptq

8B • Updated 28 days ago • 10

ramblingpolymath/qwen3-30B-A3B-w8a8

Text Generation • 31B • Updated 10 days ago • 126

ramblingpolymath/Qwen3-0.6B-W8A8

Text Generation • 0.8B • Updated 10 days ago • 138

tachyphylaxis/DeepSeek-R1-0528-FP4

Text Generation • Updated 26 days ago • 84

SandLogicTechnologies/MedGemma-4B-IT-GGUF

4B • Updated 15 days ago • 435 • 2

PJEDeveloper/Mistral_Nemo_Instruct_2407-F16.gguf-Q4_K_M

12B • Updated 20 days ago • 55

sdurgi/bert_emotion_response_classifier_quantized

Text Classification • Updated 25 days ago • 7

mirekphd/whisper-large-v3-onnx-fp16

Automatic Speech Recognition • Updated 24 days ago • 2

mirekphd/whisper-large-v3-onnx-w8a16-dynamic

Automatic Speech Recognition • Updated 24 days ago • 6

mirekphd/whisper-large-v3-onnx-w4a16-dynamic

Automatic Speech Recognition • Updated 24 days ago • 4

steampunque/Qwen2.5-VL-32B-Instruct-Hybrid-GGUF

0.7B • Updated 24 days ago • 54

theprint/Zeth-Gemma3-4B-GGUF

Text Generation • 5B • Updated 23 days ago • 133