Meta's Llama 3.3 70B Instruct Quantized Model

This repository contains the Q4 K-means quantized GGUF version of Meta's Llama 3.3 70B Instruct model, prepared by SandLogic Technologies. The quantization dramatically reduces the model size while preserving most of its performance, making it suitable for consumer hardware deployment.

Performance Characteristics

The Q4_KM quantization offers an excellent balance between model size, memory usage, and quality:

Preserves approximately 97-98% of the original model's performance
Reduces the model size from ~140GB (FP16) to ~39GB
Enables inference on consumer-grade GPUs with 12GB+ VRAM
Suitable for CPU inference on systems with 48GB+ RAM

SandLogicTechnologies
/

Llama-3.3-70B-Instruct-GGUF

Meta's Llama 3.3 70B Instruct Quantized Model

Performance Characteristics

Model tree for SandLogicTechnologies/Llama-3.3-70B-Instruct-GGUF