Meta's Llama 3.3 70B Instruct Quantized Model

This repository contains the Q4 K-means quantized GGUF version of Meta's Llama 3.3 70B Instruct model, prepared by SandLogic Technologies. The quantization dramatically reduces the model size while preserving most of its performance, making it suitable for consumer hardware deployment.

Performance Characteristics

The Q4_KM quantization offers an excellent balance between model size, memory usage, and quality:

  • Preserves approximately 97-98% of the original model's performance
  • Reduces the model size from ~140GB (FP16) to ~39GB
  • Enables inference on consumer-grade GPUs with 12GB+ VRAM
  • Suitable for CPU inference on systems with 48GB+ RAM
Downloads last month
7
GGUF
Model size
70.6B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SandLogicTechnologies/Llama-3.3-70B-Instruct-GGUF

Quantized
(114)
this model