Llama-3.2 Quantization - a neuralmagic Collection

neuralmagic 's Collections

DeepSeek-R1-Distill Quantized

Granite 3.1 Quantization

Sparse-Llama-3.1-2of4

Vision Language Models Quantization

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

Llama-3.2 Quantization

updated Sep 26, 2024

Llama 3.2 models quantized by Neural Magic

RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic

Text Generation • 11B • Updated Oct 2, 2024 • 1.36k • 24
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

Text Generation • 89B • Updated Oct 2, 2024 • 2.96k • 10
RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic

Text Generation • 1B • Updated Oct 9, 2024 • 38.4k • 3
RedHatAI/Llama-3.2-3B-Instruct-FP8-dynamic

Text Generation • 4B • Updated Oct 9, 2024 • 1.2k • 3
RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8

Text Generation • 1B • Updated Oct 16, 2024 • 41.6k • 7
RedHatAI/Llama-3.2-3B-Instruct-quantized.w8a8

Text Generation • 4B • Updated Jul 10 • 1.17k • 1
RedHatAI/Llama-3.2-1B-Instruct-FP8

Text Generation • 1B • Updated Oct 16, 2024 • 2.08k • 3
RedHatAI/Llama-3.2-3B-Instruct-FP8

Text Generation • 4B • Updated Oct 16, 2024 • 1.67k • 6
RedHatAI/Llama-3.2-1B-FP8

1B • Updated Oct 9, 2024 • 7